scholarly journals Speech Enhancement Based on the Multi-Scales and Multi-Thresholds of the Auditory Perception Wavelet Transform

2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.

Author(s):  
Yuxuan Ke ◽  
Andong Li ◽  
Chengshi Zheng ◽  
Renhua Peng ◽  
Xiaodong Li

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.


2010 ◽  
Vol 139-141 ◽  
pp. 2154-2157
Author(s):  
Ji Xiang Lu ◽  
Ping Wang ◽  
Long Yi

The voice interaction in cockpit mainly includes speech recognition, enhancement and synthesis. This interaction transfers the speech information to the corresponding orders to make machines in cockpit work unmistaken, also feedback the execution results to users by speech output devices or some other ways. The speech enhancement technology is studied in this paper, aiming at the Voice Interactive. We propose an improved spectral subtraction (SS) algorithm based on auditory masking effect, by using two steps SS. The simulated results based on the segment SNR compared to the traditional SS show the effectiveness and superiority of the improved algorithm.


2017 ◽  
Vol 2 (4) ◽  
pp. 15
Author(s):  
Mamun Ahmed ◽  
Nasimul Hyder Maruf Bhuyan

In this paper, we have presented the design, implementation and comparison result of Least Mean Square (LMS) algorithm and Normalized LMS (NLMS) algorithm using a 4 channel microphone array for noise reduction as well as speech enhancement. Adaptive sub band Generalized Side lobe Canceller (GSC) beam former has been used for experiment and analysis. Tested results were done by using one speech signal and a small number of noise sources. The side lobe canceller was evaluated with the adaptation of LMS and NLMS. The overall development of Signal to Noise Ratio (SNR) has been determined from the input and output powers of signal and noise, with signal only as input and noise, as input to the GSC. The NLMS algorithm considerably improves speech quality with noise suppression levels of up to 13 dB, while the LMS algorithm is giving up to 10 dB. In different ways of SNR measure was under various types of blocking matrix, step sizes and various noise locations. The whole process will be used for hands-free telephony, video conferencing etc. in a noisy environment.


2020 ◽  
Vol 10 (7) ◽  
pp. 2218
Author(s):  
Tao Zhang ◽  
Yanzhang Geng ◽  
Jianhong Sun ◽  
Chen Jiao ◽  
Biyun Ding

This paper presents a unified speech enhancement system to remove both background noise and interfering speech in serious noise environments by jointly utilizing the parabolic reflector model and neural beamformer. First, the amplification property of paraboloid is discussed, which significantly improves the Signal-to-Noise Ratio (SNR) of a desired signal. Therefore, an appropriate paraboloid channel is analyzed and designed through the boundary element method. On the other hand, a time-frequency masking approach and a mask-based beamforming approach are discussed and incorporated in an enhancement system. It is worth noticing that signals provided by the paraboloid and the beamformer are exactly complementary. Finally, these signals are employed in a learning-based fusion framework to further improve the system performance in low SNR environments. Experiments demonstrate that our system is effective and robust in five different noisy conditions (speech interfered with factory, pink, destroyer engine, volvo, and babble noise), as well as in different noise levels. Compared with the original noisy speech, significant average objective metrics improvements are about Δ STOI = 0.28, Δ PESQ = 1.31, Δ fwSegSNR = 11.9.


2013 ◽  
Vol 2013 ◽  
pp. 1-7
Author(s):  
Chabane Boubakir ◽  
Daoud Berkani

This paper describes a new speech enhancement approach which employs the minimum mean square error (MMSE) estimator based on the generalized gamma distribution of the short-time spectral amplitude (STSA) of a speech signal. In the proposed approach, the human perceptual auditory masking effect is incorporated into the speech enhancement system. The algorithm is based on a criterion by which the audible noise may be masked rather than being attenuated, thereby reducing the chance of speech distortion. Performance assessment is given to show that our proposal can achieve a more significant noise reduction as compared to the perceptual modification of Wiener filtering and the gamma based MMSE estimator.


2010 ◽  
Vol 8 ◽  
pp. 95-99
Author(s):  
F. X. Nsabimana ◽  
V. Subbaraman ◽  
U. Zölzer

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Dan Wang ◽  
Zhiqiang Mei ◽  
Jiamin Liang ◽  
Jinzhi Liu

Channel estimation is the key technology to ensure reliable transmission in orthogonal frequency division multiplexing (OFDM) system. In order to improve the accuracy of the channel estimation algorithm in a low signal-to-noise ratio (SNR) channel environment, in this paper, we proposed an improved channel estimation algorithm based on the transform domain. The improved algorithm with wavelet denoising (WD) and distance decision analysis (DDA) to perform secondary denoising on the channel estimation algorithm based on the transform domain is proposed. First, after the least-squares (LS) algorithm, WD is used to denoise for the first time, then the DDA is used to further suppress the residual noise in the transform domain, and the important channel taps are screened out. Simulation results show that the proposed algorithm can improve the detection performance of existing channel estimation algorithms based on transform domain in low SNR.


Author(s):  
Abhishek Kesharwani ◽  
Vaibhav Aggarwal ◽  
Shubham Singh ◽  
Rahul B R ◽  
Arvind Kumar

In marine seismic acquisitions, signal interference remains a major menace. In this paper, a denoising approach based on the Variational Mode Decomposition (VMD) combined with the Hausdorff distance (HD) and Wavelet transform (WT) is proposed. There has been substantial research in this field over the years. However, traditional denoising methods fall short of achieving satisfactory results in an extremely low signal to noise ratio (SNR) environment. The feasibility, and stability of the proposed method was validated by performing simulations in MATLAB on both a synthetic signal and a seismic signal generated using real dataset. It was found that the proposed method does well in preserving marine signals in low SNR environments, and has a superior output SNR.


2013 ◽  
Vol 645 ◽  
pp. 179-183
Author(s):  
Yao Qi Wang ◽  
Xiao Peng Wang ◽  
Raji Rafiu King

A new method of speech enhancement is proposed based on morphological filter and wavelet transform. The system begins by first conducting morphological filtering, then distinguishing the unvoiced, voiced and noise using TEO in the wavelet domain. It then executes wavelet transform using different threshold on multiscale, and at the same time to improve the threshold function. Experimental results showed that the method not only suppressed noise effectively but also reduced the loss of the unvoiced. It also not only enhanced SNR, but also improved voice clarity and comfort. The merits it espouses makes it an effective speech enhancement algorithm.


Sign in / Sign up

Export Citation Format

Share Document