Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Speech Segregation in Background Noise and Competing Speech

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

In real-world listening environments, speech reaching our ear is often accompanied by acoustic interference such as environmental sounds, music or another voice. Noise distorts speech and poses a substantial difficulty to many applications including hearing aid design and automatic speech recognition. Monaural speech segregation refers to the problem of separating speech based on only one recording and is a widely regarded challenge. In the last decades, significant progress has been made on this problem but the challenge remains.

This dissertation addresses monaural speech segregation from different interference. First, we research the problem of unvoiced speech segregation which is less studied compared to voiced speech segregation probably due to its difficulty. We propose to utilize segregated voiced speech to assist unvoiced speech segregation. Specifically, we remove all periodic signals including voiced speech from the noisy input and then estimate noise energy in unvoiced intervals using noise-dominant time-frequency units in neighboring voiced intervals. The estimated interference is used by a subtraction stage to extract unvoiced segments, which are then grouped by either simple thresholding or classification. We demonstrate that the proposed system performs substantially better than speech enhancement methods.

Interference can be nonspeech signals or other voices. Cochannel speech refers to a mixture of two speech signals. Cochannel speech separation is often addressed by model-based methods, which assume speaker identities and pretrained speaker models. To address this speaker-dependency limitation, we propose an unsupervised approach to cochannel speech separation. We employ a tandem algorithm to perform simultaneous grouping of speech and develop an unsupervised clustering method to group simultaneous streams across time. The proposed objective function for clustering measures the speaker difference of each hypothesized grouping and incorporates pitch constraints. For unvoiced speech segregation, we employ an onset/offset based analysis for segmentation, and then divide the segments into unvoiced-voiced and unvoiced-unvoiced portions for separation. We show that this method achieves considerable SNR gains over a range of input SNR conditions, and despite its unsupervised nature produces competitive performance to model-based and speaker independent methods.

In cochannel speech separation, speaker identities are sometimes known and clean utterances of each speaker are readily available. We can thus describe speakers using models to assist separation. One issue in model-based cochannel speech separation is generalization to different signal levels. We propose an iterative algorithm to separate speech signals and estimate the input SNR jointly. We employ hidden Markov models to describe speaker acoustic characteristics and temporal dynamics. Initially, we use unadapted speaker models to segregate two speech signals and then use them to estimate the input SNR. The input SNR is then utilized to adapt speaker models for re-estimating the speech signals. The two steps iterate until convergence. Systematic evaluations show that our iterative method improves segregation performance significantly and also converges relatively fast. In comparison with related model-based methods, it is computationally simpler and performs better in a number of input SNR conditions, in terms of both SNR gains and hit minus false-alarm rates.

DeLiang Wang (Committee Chair)
Eric Fosler-Lussier (Committee Member)
Mikhail Belkin (Committee Member)

Recommended Citations

Citations

  • Hu, K. (2012). Speech Segregation in Background Noise and Competing Speech [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339018952

    APA Style (7th edition)

  • Hu, Ke. Speech Segregation in Background Noise and Competing Speech. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1339018952.

    MLA Style (8th edition)

  • Hu, Ke. "Speech Segregation in Background Noise and Competing Speech." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339018952

    Chicago Manual of Style (17th edition)