Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments

Woodruff, John F.

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources.

This dissertation addresses the problems of binaural localization and segregation in reverberant environments by integrating monaural and binaural cues. Motivated by research in psychoacoustics and by developments in monaural CASA processing, we first develop a probabilistic framework for joint localization and segregation of voiced speech. Pitch cues are used to group sound components across frequency over continuous time intervals. Time-frequency regions resulting from this partial organization are then localized by integrating binaural cues, which enhances robustness to reverberation, and grouped across time based on the estimated locations. We demonstrate that this approach outperforms voiced segregation based on either monaural or binaural analysis alone. We also demonstrate substantial performance gains in terms of multisource localization, particularly for distant sources in reverberant environments and low signal-to-noise ratios. We then develop a binaural system for joint localization and segregation of an unknown and time-varying number of sources that is more flexible and requires less prior information than our initial system. This framework incorporates models trained jointly on pitch and azimuth cues, which improves performance and naturally deals with both voiced and unvoiced speech. Experimental results show that the proposed approach outperforms existing two-microphone systems in spite of less prior information.

We also consider how the computational goal of CASA-based segregation should be defined in reverberant environments. The ideal binary mask (IBM) has been established as a main goal of CASA. While the IBM is defined unambiguously in anechoic conditions, in reverberant environments there is some flexibility in how one might define the target signal itself and therefore, ambiguity is introduced to the notion of the IBM. Due to the perceptual distinction between early and late reflections, we introduce the reflection boundary as a parameter to the IBM definition to allow target reflections to be divided into desirable and undesirable components. We conduct a series of intelligibility tests with normal hearing listeners to compare alternative IBM definitions. Results show that it is vital for the IBM definition to account for the energetic effect of early target reflections, and that late target reflections should be characterized as noise.

DeLiang Wang, PhD (Advisor)
Mikhail Belkin, PhD (Committee Member)
Eric Fosler-Lussier, PhD (Committee Member)
Nicoleta Roman, PhD (Committee Member)
204 p.

Recommended Citations

Citations

  • Woodruff, J. F. (2012). Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718

    APA Style (7th edition)

  • Woodruff, John. Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718.

    MLA Style (8th edition)

  • Woodruff, John. "Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718

    Chicago Manual of Style (17th edition)