Speech Intelligibility Enhancement Using Distortion Control

2014 ◽  
Vol 912-914 ◽  
pp. 1391-1394
Author(s):  
Yu Xiang Yang ◽  
Jian Fen Ma

In order to improve the intelligibility of the noisy speech, a novel speech enhancement algorithm using distortion control is proposed. The reason why current speech enhancement algorithm cannot improve speech intelligibility is that these algorithms aim to minimize the overall distortion of the enhanced speech. However, different speech distortions make different contributions to the speech intelligibility. The distortion in excess of 6.02dB has the most detrimental effects on speech intelligibility. In the process of noise reduction, the type of speech distortion can be determined by signal distortion ratio. The distortion in excess of 6.02dB can be properly controlled via tuning the gain function of the speech enhancement algorithm. The experiment results show that the proposed algorithm can improve the intelligibility of the noisy speech considerably.

2013 ◽  
Vol 321-324 ◽  
pp. 1075-1079
Author(s):  
Peng Liu ◽  
Jian Fen Ma

A higher intelligibility speech-enhancement algorithm based on subspace is proposed. The majority existing speech-enhancement algorithms cannot effectively improve enhanced speech intelligibility. One important reason is that they only use Minimum Mean Square Error (MMSE) to constrain speech distortion but ignore that speech distortion region differences have a significant effect on intelligibility. A priori Signal Noise Ratio (SNR) and gain matrix were used to determine the distortion region. Then the gain matrix was modified to constrain the magnitude spectrum of the amplification distortion in excess of 6.02 dB which damages intelligibility much. Both objective evaluation and subjective audition show that the proposed algorithm does improve the enhanced speech intelligibility.


2014 ◽  
Vol 989-994 ◽  
pp. 2565-2568
Author(s):  
Yu Hong Liu ◽  
Dong Mei Zhou ◽  
Jing Di

This paper proposes an improved speech enhancement algorithm based on Wiener-Filtering, which addresses the problems of speech distortion and musical noise. The proposed algorithm adopts the masking properties of human auditory system on calculating the gain of spectrum point, in order that the signal in the enhanced speech whose energy is lower than the threshold will not be decreased further and the less distortion will be brought to enhanced speech by the trade-off between the noise elimination and speech signal distortion. What’s more, in order to eliminate the “musical noise”, a spectrum-shaping technology using averaging method between adjacent frames is adopted. And to guarantee the real-time application, two-stage moving-average strategy is adopted. The computer simulation results show that the proposed algorithm is superior to the traditional Wiener method in the low CPU cost, real-time statistics, the reduction of the speech distortion and residual musical noise.


2013 ◽  
Vol 760-762 ◽  
pp. 536-541 ◽  
Author(s):  
Yu Hong Liu ◽  
Dong Mei Zhou ◽  
Zhan Jun Jiang

The paper addresses the problems of speech distortion and residual musical noise introduced by conventional spectral subtraction (SS) method for speech enhancement. In this paper, we propose a modified SS algorithm for speech enhancement based on the masking properties of human auditory system. The algorithm computes the parameters α and β dynamically according to the masking thresholds of the critical frequency segments for each speech frame. Simulation results show that the proposed algorithm is superior to the conventional SS method, not only in the improvement of output SNR, but in the reduction of the speech distortion and residual musical noise.


This paper introduces technology to improve sound quality, which serves the needs of media and entertainment. Major challenging problem in the speech processing applications like mobile phones, hands-free phones, car communication, teleconference systems, hearing aids, voice coders, automatic speech recognition and forensics etc., is to eliminate the background noise. Speech enhancement algorithms are widely used for these applications in order to remove the noise from degraded speech in the noisy environment. Hence, the conventional noise reduction methods introduce more residual noise and speech distortion. So, it has been found that the noise reduction process is more effective to improve the speech quality but it affects the intelligibility of the clean speech signal. In this paper, we introduce a new model of coherence-based noise reduction method for the complex noise environment in which a target speech coexists with a coherent noise around. From the coherence model, the information of speech presence probability is added to better track noise variation accurately; and during the speech presence and speech absent period, adaptive coherence-based method is adjusted. The performance of suggested method is evaluated in condition of diffuse and real street noise, and it improves the speech signal quality less speech distortion and residual noise.


2019 ◽  
Vol 9 (12) ◽  
pp. 2520 ◽  
Author(s):  
Juan M. Martín-Doñas ◽  
Antonio M. Peinado ◽  
Iván López-Espejo ◽  
Angel Gomez

This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.


2019 ◽  
Vol 9 (16) ◽  
pp. 3396 ◽  
Author(s):  
Jianfeng Wu ◽  
Yongzhu Hua ◽  
Shengying Yang ◽  
Hongshuai Qin ◽  
Huibin Qin

This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. Unlike the other DNN-based methods, which usually train many different models on the same data and then average their predictions, or use a large number of noise types to enlarge the simulated noisy speech, the proposed method does not train a whole ensemble of models and does not require a mass of simulated noisy speech. It first trains a discriminator network and a generator network simultaneously using the adversarial learning method. Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network. Finally, the generator network is fine-tuned using real noisy speech. Experiments on CHiME4 data sets demonstrate that the proposed method achieves a more robust performance than the compared DNN-based method in terms of perceptual speech quality.


2021 ◽  
Vol 40 (1) ◽  
pp. 849-864
Author(s):  
Nasir Saleem ◽  
Muhammad Irfan Khattak ◽  
Mu’ath Al-Hasan ◽  
Atif Jan

Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.


10.14311/1111 ◽  
2009 ◽  
Vol 49 (2) ◽  
Author(s):  
V. Bolom

This paper presents properties of chosen multichannel algorithms for speech enhancement in a noisy environment. These methods are suitable for hands-free communication in a car cabin. Criteria for evaluation of these systems are also presented. The criteria consider both the level of noise suppression and the level of speech distortion. The performance of multichannel algorithms is investigated for a mixed model of speech signals and car noise and for real signals recorded in a car. 


Sign in / Sign up

Export Citation Format

Share Document