scholarly journals Noise Estimation and Suppression Using Nonlinear Function withA PrioriSpeech Absence Probability in Speech Enhancement

2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Soojeong Lee ◽  
Gangseong Lee

This paper proposes a noise-biased compensation of minimum statistics (MS) method using a nonlinear function anda priorispeech absence probability (SAP) for speech enhancement in highly nonstationary noisy environments. The MS method is a well-known technique for noise power estimation in nonstationary noisy environments; however, it tends to bias noise estimation below that of the true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function anda prioriSAP for residual noise reduction. Additionally, our method uses an autoparameter to control the trade-off between speech distortion and residual noise. We evaluate the estimation of noise power in highly nonstationary and varying noise environments. The improvement can be confirmed in terms of signal-to-noise ratio (SNR) and the Itakura-Saito Distortion Measure (ISDM).

Author(s):  
Yuxuan Ke ◽  
Andong Li ◽  
Chengshi Zheng ◽  
Renhua Peng ◽  
Xiaodong Li

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5751
Author(s):  
Seon Man Kim

This paper proposes a novel technique to improve a spectral statistical filter for speech enhancement, to be applied in wearable hearing devices such as hearing aids. The proposed method is implemented considering a 32-channel uniform polyphase discrete Fourier transform filter bank, for which the overall algorithm processing delay is 8 ms in accordance with the hearing device requirements. The proposed speech enhancement technique, which exploits the concepts of both non-negative sparse coding (NNSC) and spectral statistical filtering, provides an online unified framework to overcome the problem of residual noise in spectral statistical filters under noisy environments. First, the spectral gain attenuator of the statistical Wiener filter is obtained using the a priori signal-to-noise ratio (SNR) estimated through a decision-directed approach. Next, the spectrum estimated using the Wiener spectral gain attenuator is decomposed by applying the NNSC technique to the target speech and residual noise components. These components are used to develop an NNSC-based Wiener spectral gain attenuator to achieve enhanced speech. The performance of the proposed NNSC–Wiener filter was evaluated through a perceptual evaluation of the speech quality scores under various noise conditions with SNRs ranging from -5 to 20 dB. The results indicated that the proposed NNSC–Wiener filter can outperform the conventional Wiener filter and NNSC-based speech enhancement methods at all SNRs.


2010 ◽  
Vol 8 ◽  
pp. 95-99
Author(s):  
F. X. Nsabimana ◽  
V. Subbaraman ◽  
U. Zölzer

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.


2021 ◽  
pp. 1-12
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Qiaohe Yang ◽  
Minmin Yuan

In this paper, a single-channel speech enhancement algorithm is proposed by using guided spectrogram filtering based on masking properties of human auditory system when considering a speech spectrogram as an image. Guided filtering is capable of sharpening details and estimating unwanted textures or background noise from the noisy speech spectrogram. If we consider the noisy spectrogram as a degraded image, we can estimate the spectrogram of the clean speech signal using guided filtering after subtracting noise components. Combined with masking properties of human auditory system, the proposed algorithm adaptively adjusts and reduces the residual noise of the enhanced speech spectrogram according to the corresponding masking threshold. Because the filtering output is a local linear transform of the guidance spectrogram, the local mask window slides can be efficiently implemented via box filter with O(N) computational complexity. Experimental results show that the proposed algorithm can effectively suppress noise in different noisy environments and thus can greatly improve speech quality and speech intelligibility.


Author(s):  
Shifeng Ou ◽  
Peng Song ◽  
Ying Gao

The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


Sign in / Sign up

Export Citation Format

Share Document