Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Deep learning methods for reverberant and noisy speech enhancement

Abstract Details

2020, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
In daily listening environments, the speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions can be detrimental to speech intelligibility and quality, and also pose a serious problem for many speech-related applications, including automatic speech and speaker recognition. The objective of this dissertation is to enhance speech signals distorted by reverberation and noise, to benefit both human communications and human-machine interaction. Different from traditional signal processing approaches, we employ deep learning approaches to perform reverberant-noisy speech enhancement. Our study starts with speech dereverberation without background noise. Reverberation consists of sound wave reflections from various surfaces in an enclosed space. This means the reverberant signal at any time step includes the damped and delayed past signals. To explore such relationships at different time steps, we utilize a self-attention mechanism as a pre-processing module to produce dynamic representations. With these enhanced representations, we propose a temporal convolutional network (TCN) based speech dereverberation algorithm. Systematic evaluations demonstrate the effectiveness of the proposed algorithm in a wide range of reverberant conditions. Then we propose a deep learning based time-frequency (T-F) masking algorithm to address both reverberation and noise. Specifically, a deep neural network (DNN) is trained to estimate the ideal ratio mask (IRM), in which the anechoic-clean speech is considered as the desired signal. The enhanced speech is obtained by applying the estimated mask to the reverberant-noisy speech. Listening tests show that the proposed algorithm can improve speech intelligibility for hearing-impaired (HI) listeners substantially, and also benefit normal-hearing (NH) listeners. Considering the different natures of reverberation and noise, we propose to perform speech enhancement using a two-stage strategy, where denoising and dereverberation are conducted sequentially using DNNs. Moreover, we design a new objective function to better estimate the magnitude spectrum of anechoic-clean speech. After pre-training the denoising stage and dereverberation stage separately, the two-stage model is jointly trained with the proposed objective function. Experiments show that two-stage processing outperforms previous one-stage enhancement systems significantly. We also investigate reverberant-noisy speech enhancement in the complex domain. Instead of predicting the complex ideal ratio mask (cIRM) explicitly, our proposed algorithm estimates a complex ratio mask implicitly, and optimizes a loss function defined in terms of the complex spectrum of anechoic-clean speech. Furthermore, to integrate the contextual information among different T-F units more efficiently, we propose a new T-F attention mechanism. Together with an improved DenseUNet architecture, the proposed model improves objective metrics of speech intelligibility and quality substantially. Most existing supervised speech enhancement algorithms, including spectral mapping or T-F masking, assign the same importance to all the T-F units, without considering their different contributions to speech intelligibility or quality. To leverage the insights from speech perception, we propose a new DNN based speech enhancement method that incorporates widely used short-time objective intelligibility measure (STOI) as part of the loss function. Experimental results show that the proposed perceptually guided loss function is able to improve the STOI metric further while maintaining objective speech quality.
DeLiang Wang (Advisor)
Eric Fosler-Lussier (Committee Member)
Eric Healy (Committee Member)
168 p.

Recommended Citations

Citations

  • Zhao, Y. (2020). Deep learning methods for reverberant and noisy speech enhancement [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348

    APA Style (7th edition)

  • Zhao, Yan. Deep learning methods for reverberant and noisy speech enhancement. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348.

    MLA Style (8th edition)

  • Zhao, Yan. "Deep learning methods for reverberant and noisy speech enhancement." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348

    Chicago Manual of Style (17th edition)