scholarly journals Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Author(s):  
Maximilian Strake ◽  
Bruno Defraene ◽  
Kristoff Fluyt ◽  
Wouter Tirry ◽  
Tim Fingscheidt

AbstractSingle-channel speech enhancement in highly non-stationary noise conditions is a very challenging task, especially when interfering speech is included in the noise. Deep learning-based approaches have notably improved the performance of speech enhancement algorithms under such conditions, but still introduce speech distortions if strong noise suppression shall be achieved. We propose to address this problem by using a two-stage approach, first performing noise suppression and subsequently restoring natural sounding speech, using specifically chosen neural network topologies and loss functions for each task. A mask-based long short-term memory (LSTM) network is employed for noise suppression and speech restoration is performed via spectral mapping with a convolutional encoder-decoder network (CED). The proposed method improves speech quality (PESQ) over state-of-the-art single-stage methods by about 0.1 points for unseen highly non-stationary noise types including interfering speech. Furthermore, it is able to increase intelligibility in low-SNR conditions and consistently outperforms all reference methods.

2021 ◽  
Vol 263 (1) ◽  
pp. 5902-5909
Author(s):  
Yiya Hao ◽  
Shuai Cheng ◽  
Gong Chen ◽  
Yaobin Chen ◽  
Liang Ruan

Over the decades, the noise-suppression (NS) methods for speech enhancement (SE) have been widely utilized, including the conventional signal processing methods and the deep neural networks (DNN) methods. Although stationary-noise can be suppressed successfully using conventional or DNN methods, it is significantly challenging while suppressing the non-stationary noise, especially the transient noise. Compared to conventional NS methods, DNN NS methods may work more effectively under non-stationary noises by learning the noises' temporal-frequency characteristics. However, most DNN methods are challenging to be implemented on mobile devices due to their heavy computation complexity. Indeed, even a few low-complexity DNN methods are proposed for real-time purposes, the robustness and the generalization degrade for different types of noise. This paper proposes a single channel DNN-based NS method for transient noise with low computation complexity. The proposed method enhanced the signal-to-noise ratio (SNR) while minimizing the speech's distortion, resulting in a superior improvement of the speech quality over different noise types, including transient noise.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wanting Yu ◽  
Hongyi Yu ◽  
Ding Wang ◽  
Jianping Du ◽  
Mengli Zhang

Deep learning technology provides novel solutions for localization in complex scenarios. Conventional methods generally suffer from performance loss in the long-distance over-the-horizon (OTH) scenario due to uncertain ionospheric conditions. To overcome the adverse effects of the unknown and complex ionosphere on positioning, we propose a deep learning positioning method based on multistation received signals and bidirectional long short-term memory (BiLSTM) network framework (SL-BiLSTM), which refines position information from signal data. Specifically, we first obtain the form of the network input by constructing the received signal model. Second, the proposed method is developed to predict target positions using an SL-BiLSTM network, consisting of three BiLSTM layers, a maxout layer, a fully connected layer, and a regression layer. Then, we discuss two regularization techniques of dropout and randomization which are mainly adopted to prevent network overfitting. Simulations of OTH localization are conducted to examine the performance. The parameters of the network have been trained properly according to the scenario. Finally, the experimental results show that the proposed method can significantly improve the accuracy of OTH positioning at low SNR. When the number of training locations increases to 200, the positioning result of SL-BiLSTM is closest to CRLB at high SNR.


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


Sign in / Sign up

Export Citation Format

Share Document