Deep Learning for Acoustic Echo Cancellation and Active Noise Control

Zhang, Hao

Keyword Search

School Logo

Dissertation_HaoZhang.pdf (5.91 MB)

Deep Learning for Acoustic Echo Cancellation and Active Noise Control

Author Info

Zhang, Hao

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1650477901420567

Year and Degree

2022, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

Acoustic echo cancellation (AEC) and active noise control (ANC) have attracted increasing attention in research and industrial applications over the past few decades. Conventionally, AEC and ANC are addressed using methods that are based on adaptive signal processing with the least mean square algorithm as the foundation. They are linear systems and do not perform satisfactorily in the presence of nonlinear distortions. However, nonlinear distortions are inevitable in applications of AEC and ANC due to the limited quality of electronic devices such as amplifiers and loudspeakers. Considering the capacity of deep learning in modeling complex nonlinear relationships, we propose deep learning approaches to address AEC and ANC problems in this dissertation. Different from traditional signal processing methods, we formulate AEC as deep learning based speech separation. The proposed approach, called deep AEC, suppresses echo and noise by separating the near-end speech from a microphone signal with the accessible far-end signal as additional information. Our study of deep AEC starts with magnitude-domain estimation, and a recurrent neural network with bidirectional long short-term memory (BLSTM) is trained to estimate a spectral magnitude mask (SMM) from the microphone and far-end signals. Later, a convolutional recurrent network (CRN) is utilized for complex spectral mapping and results in better speech quality. In addition, we explore combining deep learning based and traditional AEC algorithms to further improve AEC performance. Although deep AEC produces significant improvements over traditional AEC methods, there exists a tradeoff between echo suppression and near-end speech quality. To address this, we propose a neural cascade architecture to leverage the advantages of magnitude-domain and complex-domain estimation. The proposed cascade architecture consists of two modules. A CRN is employed in the first module for complex spectral mapping. The output is then fed as an additional input to the second module, where a long short-term memory network (LSTM) is utilized for magnitude mask estimation. The entire architecture is trained in an end-to-end manner with the two modules optimized jointly using a single loss function. This cascade architecture enables deep AEC to obtain robust magnitude estimation as well as phase enhancement. Modern communication devices are usually equipped with multiple microphones and loudspeakers. Building on deep learning based AEC in the single-channel setup, we then investigate multi-channel AEC (MCAEC) and propose a deep learning based approach named deep MCAEC. We find that the deep MCAEC approach avoids the intrinsic non-uniqueness problem in traditional MCAEC algorithms. For MCAEC setup with multiple microphones, combining deep MCAEC with supervised beamforming further improves AEC performance. For ANC, we formulate it as a supervised learning problem for the first time and propose a deep learning approach, called deep ANC, to address the nonlinear ANC problem. The main idea is to employ deep learning to encode the optimal control parameters corresponding to different noises and environments. We start with a frequency-domain method and train a CRN to estimate the real and imaginary spectrograms of the canceling signal from the reference signal so that the corresponding anti-noise can eliminate or attenuate the primary noise in the ANC system. Deep ANC is a fixed-parameter ANC approach and large-scale multi-condition training is key to achieving good generalization and robustness against a variety of noises. The proposed approach outperforms traditional ANC methods, exhibits unique advantages, and can be trained to achieve active noise cancellation no matter whether the reference signal is noise or noisy speech. The latter property could dramatically expand the scope of ANC applicability. Processing latency is a critical issue for ANC due to the causality constraint of ANC systems. Deep ANC is a frequency-domain block-based method, which incurs an algorithmic delay determined by the frame size. This delay may violate the causality constraint of ANC systems and is considered as a shortcoming of frequency-domain ANC algorithms. To address this, a time-domain method using a self-attending recurrent neural network is proposed, which allows for implementing deep ANC with smaller frame sizes. Augmented with a delay-compensated training strategy and a revised overlap-add method, the algorithmic latency of deep ANC is reduced substantially without affecting ANC performance much. Finally, we expand the single-channel deep ANC to the multi-channel setup. The resulting approach, called deep MCANC, is developed for active noise control at multiple spatial points (multi-point ANC) and within a spatial zone (generating a quiet zone). In addition, we evaluate the performance of deep MCANC under different setups and examine the impact of factors such as the number of loudspeakers and microphones, and the position of a secondary source, on MCANC performance.

Committee

DeLiang Wang (Advisor)
Wei-Lun Chao (Committee Member)
Eric Fosler-Lussier (Committee Member)

Subject Headings

Acoustics; Computer Engineering; Computer Science

Keywords

Acoustic echo cancellation (AEC), active noise control (ANC), multi-channel AEC, multi-channel ANC, deep learning

Zhang, H. (2022). Deep Learning for Acoustic Echo Cancellation and Active Noise Control [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1650477901420567
APA Style (7th edition)
Zhang, Hao. Deep Learning for Acoustic Echo Cancellation and Active Noise Control. 2022. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1650477901420567.
MLA Style (8th edition)
Zhang, Hao. "Deep Learning for Acoustic Echo Cancellation and Active Noise Control." Doctoral dissertation, Ohio State University, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=osu1650477901420567
Chicago Manual of Style (17th edition)

Document number:

osu1650477901420567

Download Count:

1,128

Copyright Info

Deep Learning for Acoustic Echo Cancellation and Active Noise Control by Hao Zhang is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Deep Learning for Acoustic Echo Cancellation and Active Noise Control

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Deep Learning for Acoustic Echo Cancellation and Active Noise Control

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations