Abstract
Deep learning based methods have been a recent benchmark method for speech enhancement. However, these approaches are limited in low signal-to-noise ratios (SNR) conditions, for speech loss and low intelligibility. To address this problem, we improve Multi-Resolution Cochleagram (MRCG), and gammachirp filter bank is used to decompose the speech signal in time and frequency, and the low-resolution signal is denoised by the minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA). Improve Multi-Resolution Cochleagram (I-MRCG) is adopted as the input feature of Skip connections-DNN (Skip-DNN). In this paper, the source to distortion ratio (SDR) is used in the training process, and the logarithm is introduced to observe the iterative process more clearly. Experiments were performed on the TIMIT database with four noise types at four levels of SNR. I-MRCG as the input feature of the Skip-DNN model, the average PESQ is 2.6783, and the average STOI is 0.8752. Compared with MRCG, the PESQ and STOI obtained by MRCG are increased 1.4% and 1.5%, respectively. This shows that MRCG is the input feature of the Skip-DNN model, and the speech enhancement effect after training is better than other features. It can not only solve the problem of speech loss in a low SNR environment, but also obtain more robust speech enhancement. The loss function experiment shows that compared to MSE and SDR, the improved SDR as the loss function of the speech enhancement model has the best enhancement effect.