Monaural speech organization and segregation

Hu, Guoning

Keyword Search

School Logo

osu1143212799.pdf (5.13 MB)

Monaural speech organization and segregation

Author Info

Hu, Guoning

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1143212799

Year and Degree

2006, Doctor of Philosophy, Ohio State University, Biophysics.

Abstract

In a natural environment, speech often occurs simultaneously with acoustic interference. Many applications, such as automatic speech recognition and telecommunication, require an effective system that segregates speech from interference in the monaural (one-microphone) situation. While this task of monaural speech segregation has proven to be very challenging, human listeners show a remarkable ability to segregate an acoustic mixture and attend to a target sound, even with one ear. This perceptual process is called auditory scene analysis (ASA). Research in ASA has inspired considerable effort in constructing computational ASA (CASA) based on ASA principles. Current CASA systems, however, face a number of challenges in monaural speech segregation.

This dissertation presents a systematic and extensive effort in developing a CASA system for monaural speech segregation that addresses several major challenges. The proposed system consists of four stages: Peripheral analysis, feature extraction, segmentation, and grouping. In the first stage, the system decomposes the auditory scene into a time-frequency representation via bandpass filtering and time windowing. The second stage extracts auditory features corresponding to ASA cues, such as periodicity, amplitude modulation, onset and offset. In the third stage, the system segments an auditory scene based on a multiscale analysis of onset and offset. The last stage includes an iterative algorithm that simultaneously estimates the pitch of a target utterance and segregates the voiced target based on a pitch estimate. Finally, our system sequentially groups voiced and unvoiced portions of the target speech for non-speech interference, and this grouping task is performed using feature-based classification.

Systematic evaluation shows that the proposed system extracts a majority of target speech without including much interference. Extensive comparisons demonstrate that the system has substantially advanced the state-of-the-art performance in voiced speech segregation, and represents the first systematic study of unvoiced speech segregation.

Committee

DeLiang Wang (Advisor)
William Masters (Other)
Eric Fosler-Lussier (Other)

Keywords

unvoiced; SPEECH; pitch; SNR; voiced; onset and offset

Hu, G. (2006). Monaural speech organization and segregation [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1143212799
APA Style (7th edition)
Hu, Guoning. Monaural speech organization and segregation. 2006. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1143212799.
MLA Style (8th edition)
Hu, Guoning. "Monaural speech organization and segregation." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1143212799
Chicago Manual of Style (17th edition)

Document number:

osu1143212799

Download Count:

1,925

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Monaural speech organization and segregation

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Monaural speech organization and segregation

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations