CASA-BASED ROBUST SPEAKER IDENTIFICATION

Zhao, Xiaojia

Keyword Search

School Logo

Dissertation_XiaojiaZhao.pdf (1.26 MB)

CASA-BASED ROBUST SPEAKER IDENTIFICATION

Author Info

Zhao, Xiaojia

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1402620178

Year and Degree

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

As a primary topic in speaker recognition, speaker identification (SID) aims to identify the underlying speaker(s) given a speech utterance. SID systems perform well under matched training and test conditions. In real-world environments, mismatch caused by background noise, room reverberation or competing voice significantly degrades the performance of such systems. Achieving robustness to the SID systems becomes an important research problem. Existing approaches address this problem from different perspectives such as proposing robust speaker features, introducing noise to clean speaker models, and using speech enhancement methods to restore clean speech characteristics. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech from interference by producing a time-frequency mask. This dissertation aims to address the SID robustness problem in the CASA framework. We first deal with the noise robustness of SID systems. We employ an auditory feature, gammatone frequency cepstral coefficient (GFCC), and show that this feature captures speaker characteristics and performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine these two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios (SNR). In addition, we conduct a systematic investigation on why GFCC shows superior noise robustness and conclude that nonlinear log rectification is likely the reason. Speech is often corrupted by both noise and reverberation. There have been studies to address each of them, but the combined effects of noise and reverberation have been rarely studied. We address this issue in two phases. We first remove background noise through binary masking using a deep neural network (DNN) classifier. Then we perform robust SID with speaker models trained in selected reverberant conditions, on the basis of bounded marginalization and direct masking. Evaluation results show that the proposed method substantially improves SID performance compared to related systems in a wide range of reverberation time and SNRs. The aforementioned studies handle mixtures of target speech and non-speech intrusions by taking advantage of their different characteristics. Such methods may not apply if the intrusion is a competing voice, which is of similar characteristics as the target. SID in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a well-known challenge. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This dissertation studies cochannel SID in both anechoic and reverberant conditions. We first investigate GMM-based approaches and propose a combined system that integrates two cochannel SID methods. Secondly, we explore DNNs for cochannel SID and propose a DNN-based recognition system. Evaluation results demonstrate that our proposed systems significantly improve SID performance over recent approaches in both anechoic and reverberant conditions and various target-to-interferer ratios.

Committee

DeLiang Wang, Professor (Advisor)
Eric Fosler-Lussier, Professor (Committee Member)
Mikhail Belkin, Professor (Committee Member)

Pages

155 p.

Subject Headings

Computer Engineering; Computer Science

Zhao, X. (2014). CASA-BASED ROBUST SPEAKER IDENTIFICATION [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1402620178
APA Style (7th edition)
Zhao, Xiaojia. CASA-BASED ROBUST SPEAKER IDENTIFICATION. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1402620178.
MLA Style (8th edition)
Zhao, Xiaojia. "CASA-BASED ROBUST SPEAKER IDENTIFICATION." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1402620178
Chicago Manual of Style (17th edition)

Document number:

osu1402620178

Download Count:

2,586

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

CASA-BASED ROBUST SPEAKER IDENTIFICATION

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

CASA-BASED ROBUST SPEAKER IDENTIFICATION

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations