scholarly journals A single channel speech enhancement technique exploiting human auditory masking properties

2010 ◽  
Vol 8 ◽  
pp. 95-99
Author(s):  
F. X. Nsabimana ◽  
V. Subbaraman ◽  
U. Zölzer

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.

Author(s):  
Judith Justin ◽  
Vanithamani R.

In this chapter, a speech enhancement technique is implemented using a neuro-fuzzy classifier. Noisy speech sentences from NOIZEUS and AURORA databases are taken for the study. Feature extraction is implemented through modifications in amplitude magnitude spectrograms. A four class neuro-fuzzy classifier splits the noisy speech samples into noise-only part, signal only part, more noise-less signal part, and more signal-less noise part of the time-frequency units. Appropriate weights are applied in the enhancement phase. The enhanced speech sentence is evaluated using objective measures. An analysis of the performance of the Neuro-Fuzzy 4 (NF 4) classifier is done. A comparison of the performance of the classifier with other conventional techniques is done for various noises at different noise levels. It is observed that the numerical values of the measures obtained are better when compared to the others. An overall comparison of the performance of the NF 4 classifier is done and it is inferred that NF4 outperforms the other techniques in speech enhancement.


2021 ◽  
pp. 1-12
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Qiaohe Yang ◽  
Minmin Yuan

In this paper, a single-channel speech enhancement algorithm is proposed by using guided spectrogram filtering based on masking properties of human auditory system when considering a speech spectrogram as an image. Guided filtering is capable of sharpening details and estimating unwanted textures or background noise from the noisy speech spectrogram. If we consider the noisy spectrogram as a degraded image, we can estimate the spectrogram of the clean speech signal using guided filtering after subtracting noise components. Combined with masking properties of human auditory system, the proposed algorithm adaptively adjusts and reduces the residual noise of the enhanced speech spectrogram according to the corresponding masking threshold. Because the filtering output is a local linear transform of the guidance spectrogram, the local mask window slides can be efficiently implemented via box filter with O(N) computational complexity. Experimental results show that the proposed algorithm can effectively suppress noise in different noisy environments and thus can greatly improve speech quality and speech intelligibility.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Soojeong Lee ◽  
Gangseong Lee

This paper proposes a noise-biased compensation of minimum statistics (MS) method using a nonlinear function anda priorispeech absence probability (SAP) for speech enhancement in highly nonstationary noisy environments. The MS method is a well-known technique for noise power estimation in nonstationary noisy environments; however, it tends to bias noise estimation below that of the true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function anda prioriSAP for residual noise reduction. Additionally, our method uses an autoparameter to control the trade-off between speech distortion and residual noise. We evaluate the estimation of noise power in highly nonstationary and varying noise environments. The improvement can be confirmed in terms of signal-to-noise ratio (SNR) and the Itakura-Saito Distortion Measure (ISDM).


2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


Sign in / Sign up

Export Citation Format

Share Document