Speech enhancement based on audible noise spectrum and short-time spectral amplitude estimator

This paper describes a new speech enhancement approach which employs the minimum mean square error (MMSE) estimator based on the generalized gamma distribution of the short-time spectral amplitude (STSA) of a speech signal. In the proposed approach, the human perceptual auditory masking effect is incorporated into the speech enhancement system. The algorithm is based on a criterion by which the audible noise may be masked rather than being attenuated, thereby reducing the chance of speech distortion. Performance assessment is given to show that our proposal can achieve a more significant noise reduction as compared to the perceptual modification of Wiener filtering and the gamma based MMSE estimator.

Download Full-text

Speech Enhancement Using a MMSE Short Time Spectral Amplitude Estimator with Laplacian Speech Modeling

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. ◽

10.1109/icassp.2005.1415309 ◽

2006 ◽

Cited By ~ 1

Author(s):

Bin Chen ◽

P.C. Loizou

Keyword(s):

Speech Enhancement ◽

Spectral Amplitude ◽

Speech Modeling ◽

Short Time

Download Full-text

Multiresolution Cochleagram Speech Enhancement Algorithm Using Improved Deep Neural Networks with Skip Connections

10.21203/rs.3.rs-229829/v1 ◽

2021 ◽

Author(s):

chaofeng lan ◽

Chundong Liu ◽

Lei Zhang

Keyword(s):

Speech Enhancement ◽

Loss Function ◽

Minimum Mean Square Error ◽

Spectral Amplitude ◽

Enhancement Effect ◽

Low Snr ◽

Input Feature ◽

Four Levels ◽

Short Time ◽

Better Than

Abstract Deep learning based methods have been a recent benchmark method for speech enhancement. However, these approaches are limited in low signal-to-noise ratios (SNR) conditions, for speech loss and low intelligibility. To address this problem, we improve Multi-Resolution Cochleagram (MRCG), and gammachirp filter bank is used to decompose the speech signal in time and frequency, and the low-resolution signal is denoised by the minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA). Improve Multi-Resolution Cochleagram (I-MRCG) is adopted as the input feature of Skip connections-DNN (Skip-DNN). In this paper, the source to distortion ratio (SDR) is used in the training process, and the logarithm is introduced to observe the iterative process more clearly. Experiments were performed on the TIMIT database with four noise types at four levels of SNR. I-MRCG as the input feature of the Skip-DNN model, the average PESQ is 2.6783, and the average STOI is 0.8752. Compared with MRCG, the PESQ and STOI obtained by MRCG are increased 1.4% and 1.5%, respectively. This shows that MRCG is the input feature of the Skip-DNN model, and the speech enhancement effect after training is better than other features. It can not only solve the problem of speech loss in a low SNR environment, but also obtain more robust speech enhancement. The loss function experiment shows that compared to MSE and SDR, the improved SDR as the loss function of the speech enhancement model has the best enhancement effect.

Download Full-text