Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Sequential organization in computational auditory scene analysis

Abstract Details

2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.

A human listener's ability to organize the time-frequency (T-F) energy of the same sound source into a single stream is termed auditory scene analysis (ASA). Computational auditory scene analysis (CASA) seeks to organize sound based on ASA principles. This dissertation presents a systematic effort on sequential organization in CASA. The organization goal is to group T-F segments from the same speaker that are separated in time into a single stream.

This dissertation proposes a speaker-model-based sequential organization framework and it shows better grouping performance than feature-based methods. Specifically, a computational objective is derived for sequential grouping in the context of speaker recognition for multi-talker mixtures. This formulation leads to a grouping system that searches for the optimal grouping of separated speech segments. A hypothesis pruning method is then proposed that significantly reduces search space and time while achieving performance close to that of exhaustive search. Evaluations show that the proposed system improves both grouping performance and speech recognition accuracy. The proposed system is then extended to handle multi-talker as well as non-speech intrusions using generic models. The system is further extended to deal with noisy inputs from unknown speakers. It employs a speaker quantization method that extracts generic models from a large speaker space. The resulting grouping performance is only moderately lower than that with known speaker models.

In addition, this dissertation presents a systematic effort in robust speaker recognition. A novel usable speech extraction method is proposed that significantly improves recognition performance. A general solution is proposed for speaker recognition under additive-noise conditions. Novel speaker features are derived from auditory filtering, and are used in conjunction with an uncertainty decoder that accounts for mismatch introduced in CASA front-end processing. Evaluations show that the proposed system achieves significant performance improvement over the use of typical speaker features and a state-of-the-art robust front-end processor for noisy speech.

DeLiang Wang (Advisor)
188 p.

Recommended Citations

Citations

  • Shao, Y. (2007). Sequential organization in computational auditory scene analysis [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190127412

    APA Style (7th edition)

  • Shao, Yang. Sequential organization in computational auditory scene analysis. 2007. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1190127412.

    MLA Style (8th edition)

  • Shao, Yang. "Sequential organization in computational auditory scene analysis." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190127412

    Chicago Manual of Style (17th edition)