scholarly journals Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Author(s):  
Yuxuan Ke ◽  
Andong Li ◽  
Chengshi Zheng ◽  
Renhua Peng ◽  
Xiaodong Li

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.

2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Soojeong Lee ◽  
Gangseong Lee

This paper proposes a noise-biased compensation of minimum statistics (MS) method using a nonlinear function anda priorispeech absence probability (SAP) for speech enhancement in highly nonstationary noisy environments. The MS method is a well-known technique for noise power estimation in nonstationary noisy environments; however, it tends to bias noise estimation below that of the true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function anda prioriSAP for residual noise reduction. Additionally, our method uses an autoparameter to control the trade-off between speech distortion and residual noise. We evaluate the estimation of noise power in highly nonstationary and varying noise environments. The improvement can be confirmed in terms of signal-to-noise ratio (SNR) and the Itakura-Saito Distortion Measure (ISDM).


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


Author(s):  
Feng Bao ◽  
Waleed H. Abdulla

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.


2017 ◽  
Vol 2 (4) ◽  
pp. 15
Author(s):  
Mamun Ahmed ◽  
Nasimul Hyder Maruf Bhuyan

In this paper, we have presented the design, implementation and comparison result of Least Mean Square (LMS) algorithm and Normalized LMS (NLMS) algorithm using a 4 channel microphone array for noise reduction as well as speech enhancement. Adaptive sub band Generalized Side lobe Canceller (GSC) beam former has been used for experiment and analysis. Tested results were done by using one speech signal and a small number of noise sources. The side lobe canceller was evaluated with the adaptation of LMS and NLMS. The overall development of Signal to Noise Ratio (SNR) has been determined from the input and output powers of signal and noise, with signal only as input and noise, as input to the GSC. The NLMS algorithm considerably improves speech quality with noise suppression levels of up to 13 dB, while the LMS algorithm is giving up to 10 dB. In different ways of SNR measure was under various types of blocking matrix, step sizes and various noise locations. The whole process will be used for hands-free telephony, video conferencing etc. in a noisy environment.


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Chunlei Liu ◽  
Longbiao Wang ◽  
Jianwu Dang

Mapping and masking are two important speech enhancement methods based on deep learning that aim to recover the original clean speech from corrupted speech. In practice, too large recovery errors severely restrict the improvement in speech quality. In our preliminary experiment, we demonstrated that mapping and masking methods had different conversion mechanisms and thus assumed that their recovery errors are highly likely to be complementary. Also, the complementarity was validated accordingly. Based on the principle of error minimization, we propose the fusion between mapping and masking for speech dereverberation. Specifically, we take the weighted mean of the amplitudes recovered by the two methods as the estimated amplitude of the fusion method. Experiments verify that the recovery error of the fusion method is further controlled. Compared with the existing geometric mean method, the weighted mean method we proposed has achieved better results. Speech dereverberation experiments manifest that the weighted mean method improves PESQ and SNR by 5.8% and 25.0%, respectively, compared with the traditional masking method.


2021 ◽  
Vol 263 (1) ◽  
pp. 5902-5909
Author(s):  
Yiya Hao ◽  
Shuai Cheng ◽  
Gong Chen ◽  
Yaobin Chen ◽  
Liang Ruan

Over the decades, the noise-suppression (NS) methods for speech enhancement (SE) have been widely utilized, including the conventional signal processing methods and the deep neural networks (DNN) methods. Although stationary-noise can be suppressed successfully using conventional or DNN methods, it is significantly challenging while suppressing the non-stationary noise, especially the transient noise. Compared to conventional NS methods, DNN NS methods may work more effectively under non-stationary noises by learning the noises' temporal-frequency characteristics. However, most DNN methods are challenging to be implemented on mobile devices due to their heavy computation complexity. Indeed, even a few low-complexity DNN methods are proposed for real-time purposes, the robustness and the generalization degrade for different types of noise. This paper proposes a single channel DNN-based NS method for transient noise with low computation complexity. The proposed method enhanced the signal-to-noise ratio (SNR) while minimizing the speech's distortion, resulting in a superior improvement of the speech quality over different noise types, including transient noise.


2010 ◽  
Vol 8 ◽  
pp. 95-99
Author(s):  
F. X. Nsabimana ◽  
V. Subbaraman ◽  
U. Zölzer

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.


2022 ◽  
Vol 14 (2) ◽  
pp. 263
Author(s):  
Haixia Zhao ◽  
Tingting Bai ◽  
Zhiqiang Wang

Seismic field data are usually contaminated by random or complex noise, which seriously affect the quality of seismic data contaminating seismic imaging and seismic interpretation. Improving the signal-to-noise ratio (SNR) of seismic data has always been a key step in seismic data processing. Deep learning approaches have been successfully applied to suppress seismic random noise. The training examples are essential in deep learning methods, especially for the geophysical problems, where the complete training data are not easy to be acquired due to high cost of acquisition. In this work, we propose a natural images pre-trained deep learning method to suppress seismic random noise through insight of the transfer learning. Our network contains pre-trained and post-trained networks: the former is trained by natural images to obtain the preliminary denoising results, while the latter is trained by a small amount of seismic images to fine-tune the denoising effects by semi-supervised learning to enhance the continuity of geological structures. The results of four types of synthetic seismic data and six field data demonstrate that our network has great performance in seismic random noise suppression in terms of both quantitative metrics and intuitive effects.


Author(s):  
Alejandro Jose Uriz ◽  
Jorge Castineira ◽  
Pablo Aguero ◽  
Juan Tulli ◽  
Roberto Hidalgo ◽  
...  

2020 ◽  
Vol 10 (8) ◽  
pp. 2894 ◽  
Author(s):  
Andong Li ◽  
Renhua Peng ◽  
Chengshi Zheng ◽  
Xiaodong Li

For voice communication, it is important to extract the speech from its noisy version without introducing unnaturally artificial noise. By studying the subband mean-squared error (MSE) of the speech for unsupervised speech enhancement approaches and revealing its relationship with the existing loss function for supervised approaches, this paper derives a generalized loss function that takes residual noise control into account with a supervised approach. Our generalized loss function contains the well-known MSE loss function and many other often-used loss functions as special cases. Compared with traditional loss functions, our generalized loss function is more flexible to make a good trade-off between speech distortion and noise reduction. This is because a group of well-studied noise shaping schemes can be introduced to control residual noise for practical applications. Objective and subjective test results verify the importance of residual noise control for the supervised speech enhancement approach.


Sign in / Sign up

Export Citation Format

Share Document