Noise Estimation and Suppression Using Nonlinear Function withA PrioriSpeech Absence Probability in Speech Enhancement

This paper proposes a noise-biased compensation of minimum statistics (MS) method using a nonlinear function anda priorispeech absence probability (SAP) for speech enhancement in highly nonstationary noisy environments. The MS method is a well-known technique for noise power estimation in nonstationary noisy environments; however, it tends to bias noise estimation below that of the true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function anda prioriSAP for residual noise reduction. Additionally, our method uses an autoparameter to control the trade-off between speech distortion and residual noise. We evaluate the estimation of noise power in highly nonstationary and varying noise environments. The improvement can be confirmed in terms of signal-to-noise ratio (SNR) and the Itakura-Saito Distortion Measure (ISDM).

Download Full-text

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00204-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yuxuan Ke ◽

Andong Li ◽

Chengshi Zheng ◽

Renhua Peng ◽

Xiaodong Li

Keyword(s):

Deep Learning ◽

Speech Enhancement ◽

Noise Suppression ◽

Signal To Noise Ratio ◽

Low Complexity ◽

Speech Quality ◽

Artificial Noise ◽

Noise Power ◽

Noise Masking ◽

Residual Noise

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.

Download Full-text

Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction

Sensors ◽

10.3390/s20205751 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5751

Author(s):

Seon Man Kim

Keyword(s):

Speech Enhancement ◽

Hearing Aids ◽

Sparse Coding ◽

Signal To Noise Ratio ◽

A Priori ◽

Wiener Filter ◽

Unified Framework ◽

Perceptual Evaluation ◽

Residual Noise ◽

Hearing Device

This paper proposes a novel technique to improve a spectral statistical filter for speech enhancement, to be applied in wearable hearing devices such as hearing aids. The proposed method is implemented considering a 32-channel uniform polyphase discrete Fourier transform filter bank, for which the overall algorithm processing delay is 8 ms in accordance with the hearing device requirements. The proposed speech enhancement technique, which exploits the concepts of both non-negative sparse coding (NNSC) and spectral statistical filtering, provides an online unified framework to overcome the problem of residual noise in spectral statistical filters under noisy environments. First, the spectral gain attenuator of the statistical Wiener filter is obtained using the a priori signal-to-noise ratio (SNR) estimated through a decision-directed approach. Next, the spectrum estimated using the Wiener spectral gain attenuator is decomposed by applying the NNSC technique to the target speech and residual noise components. These components are used to develop an NNSC-based Wiener spectral gain attenuator to achieve enhanced speech. The performance of the proposed NNSC–Wiener filter was evaluated through a perceptual evaluation of the speech quality scores under various noise conditions with SNRs ranging from -5 to 20 dB. The results indicated that the proposed NNSC–Wiener filter can outperform the conventional Wiener filter and NNSC-based speech enhancement methods at all SNRs.

Download Full-text

A priori SNR estimation and noise estimation for speech enhancement

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-016-0398-z ◽

2016 ◽

Vol 2016 (1) ◽

Cited By ~ 2

Author(s):

Rui Yao ◽

ZeQing Zeng ◽

Ping Zhu

Keyword(s):

Speech Enhancement ◽

A Priori ◽

Noise Estimation ◽

Snr Estimation ◽

A Priori Snr Estimation

Download Full-text

Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement

Speech Communication ◽

10.1016/j.specom.2012.09.004 ◽

2013 ◽

Vol 55 (2) ◽

pp. 358-376 ◽

Cited By ~ 26

Author(s):

Pei Chee Yong ◽

Sven Nordholm ◽

Hai Huyen Dam

Keyword(s):

Real Time ◽

Speech Enhancement ◽

A Priori ◽

Sigmoid Function

Download Full-text

Speech enhancement based on a priori signal to noise estimation

1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings ◽

10.1109/icassp.1996.543199 ◽

2002 ◽

Cited By ~ 256

Author(s):

P. Scalart ◽

J.V. Filho

Keyword(s):

Speech Enhancement ◽

A Priori ◽

Noise Estimation ◽

Signal To Noise

Download Full-text

Speech enhancement employing a sigmoid -type gain function with a modified a priori signal-to-noise ratio (SNR) estimator

2008 Canadian Conference on Electrical and Computer Engineering ◽

10.1109/ccece.2008.4564612 ◽

2008 ◽

Cited By ~ 1

Author(s):

Md. Jahangir Alam ◽

Douglas O'Shaughnessy ◽

Sid-Ahmed Selouani

Keyword(s):

Speech Enhancement ◽

Signal To Noise Ratio ◽

A Priori ◽

Gain Function ◽

Signal To Noise ◽

Noise Ratio

Download Full-text

A single channel speech enhancement technique exploiting human auditory masking properties

Advances in Radio Science ◽

10.5194/ars-8-95-2010 ◽

2010 ◽

Vol 8 ◽

pp. 95-99

Author(s):

F. X. Nsabimana ◽

V. Subbaraman ◽

U. Zölzer

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Noise Power ◽

Auditory Masking ◽

Enhancement Technique ◽

Time Frequency ◽

Residual Noise ◽

Dependent Parameter ◽

A New Technique ◽

Spectral Weighting

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.

Download Full-text

Speech enhancement based on perceptually motivated guided spectrogram filtering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202278 ◽

2021 ◽

pp. 1-12

Author(s):

Jie Wang ◽

Linhuang Yan ◽

Qiaohe Yang ◽

Minmin Yuan

Keyword(s):

Auditory System ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Single Channel ◽

Noisy Environments ◽

Human Auditory System ◽

Residual Noise ◽

Guided Filtering ◽

Degraded Image ◽

Linear Transform

In this paper, a single-channel speech enhancement algorithm is proposed by using guided spectrogram filtering based on masking properties of human auditory system when considering a speech spectrogram as an image. Guided filtering is capable of sharpening details and estimating unwanted textures or background noise from the noisy speech spectrogram. If we consider the noisy spectrogram as a degraded image, we can estimate the spectrogram of the clean speech signal using guided filtering after subtracting noise components. Combined with masking properties of human auditory system, the proposed algorithm adaptively adjusts and reduces the residual noise of the enhanced speech spectrogram according to the corresponding masking threshold. Because the filtering output is a local linear transform of the guidance spectrogram, the local mask window slides can be efficiently implemented via box filter with O(N) computational complexity. Experimental results show that the proposed algorithm can effectively suppress noise in different noisy environments and thus can greatly improve speech quality and speech intelligibility.

Download Full-text

Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417580034 ◽

2017 ◽

Vol 31 (08) ◽

pp. 1758003

Author(s):

Shifeng Ou ◽

Peng Song ◽

Ying Gao

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Single Channel ◽

Signal To Noise Ratio ◽

A Priori ◽

Processing System ◽

Phase Information ◽

Amplitude Spectra ◽

Phase Sensitive ◽

Short Time

The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.

Download Full-text

Speech Enhancement Based on the Multi-Scales and Multi-Thresholds of the Auditory Perception Wavelet Transform

Archives of Acoustics ◽

10.2478/v10168-011-0037-5 ◽

2011 ◽

Vol 36 (3) ◽

pp. 519-532 ◽

Cited By ~ 2

Author(s):

Zhi Tao ◽

He-Ming Zhao ◽

Xiao-Jun Zhang ◽

Di Wu

Keyword(s):

Wavelet Transform ◽

Speech Enhancement ◽

Auditory Perception ◽

Noise Suppression ◽

Signal To Noise Ratio ◽

Masking Effect ◽

Auditory Masking ◽

Low Snr ◽

Residual Noise ◽

Multi Scales

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.

Download Full-text