Monaural Musical Sound Separation

Li, Yipeng

Keyword Search

School Logo

osu1211994188.pdf (1.85 MB)

Monaural Musical Sound Separation

Author Info

Li, Yipeng

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1211994188

Year and Degree

2008, Doctor of Philosophy, Ohio State University, Computer and Information Science.

Abstract

Monaural musical sound separation attempts to isolate one or more sound sources from a single-channel polyphonic signal. The main motivation of this study is the capability of the human auditory system in organizing an acoustic mixture into different perceptual streams which correspond to different sound sources. The underlying perceptual process is called auditory scene analysis (ASA) and it has inspired the development of computational auditory scene analysis (CASA).

A recent development in CASA is the establishment of ideal binary masks (IBM) as a major goal for CASA. The IBM has several desirable properties as an objective of CASA systems and one of them is the purported optimality with respect to signal-to-noise ratio (SNR) among all the binary masks. However, this optimality has not been rigorously addressed. This dissertation gives a formal treatment on this issue and clarifies the conditions for the IBM to be optimal. This dissertation also shows that IBMs are close in performance to ideal ratio masks which are closely related to the Wiener filter, the theoretically optimal linear filter. As a result, the IBM is adopted as our computational goal for musical sound separation when binary masking is considered.

Pitch is a primary cue in the perceptual organization of sounds. Since the majority of musical sounds are pitched, this dissertation is centered on pitch-based sound separation. The first system aims to separate singing voice from music accompaniment and features an effective algorithm for detecting the pitch contours of singing voice in the presence of other musical sounds. The system consists of three stages. The singing voice detection stage partitions and classifies an input into vocal and non-vocal portions. For vocal portions, the predominant pitch detection stage detects the pitch contours of the singing voice and then the separation stage uses the detected pitch contours to group the time-frequency segments of the singing voice. Quantitative results show that the system performs the separation task successfully.

The second system attempts to separate instrument sounds from a polyphonic signal. This system focuses on addressing the problem of overlapping harmonics, a major difficulty in musical sound separation. To make reliable binary decisions on which source has stronger energy in an overlapping region, the contextual information of sounds is utilized based on the assumption that sounds from the same source tend to have similar spectral envelopes. Quantitative results show that this strategy can help binary decisions in overlapping regions and consequently improve the SNR performance of separation.

To achieve higher separation quality, a sinusoidal modeling-based separation system is developed with the emphasis on resolving overlapping harmonics. This system also utilizes contextual information of sounds: harmonics of the same source have correlated amplitude envelopes. This is known as common amplitude modulation in ASA. Another observation is that the phase change of harmonics can be predicted from pitch points. These two observations are incorporated in a least-squares estimation framework for separation. An effective technique is introduced to improve the accuracy of pitch estimation and make the system applicable to practical applications. Quantitative evaluation of the proposed system shows that it performs significantly better than existing monaural musical sound separation systems.

Committee

DeLiang Wang, Professor (Advisor)
Eric Fosler-Lussier, Professor (Committee Member)
Philip Schniter, Professor (Committee Member)

Pages

155 p.

Subject Headings

Computer Science

Li, Y. (2008). Monaural Musical Sound Separation [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211994188
APA Style (7th edition)
Li, Yipeng. Monaural Musical Sound Separation. 2008. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1211994188.
MLA Style (8th edition)
Li, Yipeng. "Monaural Musical Sound Separation." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211994188
Chicago Manual of Style (17th edition)

Document number:

osu1211994188

Download Count:

1,682

Copyright Info

Monaural Musical Sound Separation by Yipeng Li is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Monaural Musical Sound Separation

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Monaural Musical Sound Separation

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations