Towards a unified optimal spectral amplitude estimator for speech enhancement in various low-SNR environments

Author(s):  
H. Tolba ◽  
Zili Li ◽  
D. O'Shaughnessy
2021 ◽  
Author(s):  
chaofeng lan ◽  
Chundong Liu ◽  
Lei Zhang

Abstract Deep learning based methods have been a recent benchmark method for speech enhancement. However, these approaches are limited in low signal-to-noise ratios (SNR) conditions, for speech loss and low intelligibility. To address this problem, we improve Multi-Resolution Cochleagram (MRCG), and gammachirp filter bank is used to decompose the speech signal in time and frequency, and the low-resolution signal is denoised by the minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA). Improve Multi-Resolution Cochleagram (I-MRCG) is adopted as the input feature of Skip connections-DNN (Skip-DNN). In this paper, the source to distortion ratio (SDR) is used in the training process, and the logarithm is introduced to observe the iterative process more clearly. Experiments were performed on the TIMIT database with four noise types at four levels of SNR. I-MRCG as the input feature of the Skip-DNN model, the average PESQ is 2.6783, and the average STOI is 0.8752. Compared with MRCG, the PESQ and STOI obtained by MRCG are increased 1.4% and 1.5%, respectively. This shows that MRCG is the input feature of the Skip-DNN model, and the speech enhancement effect after training is better than other features. It can not only solve the problem of speech loss in a low SNR environment, but also obtain more robust speech enhancement. The loss function experiment shows that compared to MSE and SDR, the improved SDR as the loss function of the speech enhancement model has the best enhancement effect.


2018 ◽  
Vol 141 ◽  
pp. 333-347 ◽  
Author(s):  
Nasir Saleem ◽  
Muhammad Irfan Khattak ◽  
Muhammad Shafi

2019 ◽  
Vol 22 (1) ◽  
pp. 283-292 ◽  
Author(s):  
Samba Raju Chiluveru ◽  
Manoj Tripathy

Sign in / Sign up

Export Citation Format

Share Document