Speech enhancement using a modified Kalman filter based on complex linear prediction and supergaussian priors

Author(s):  
Thomas Esch ◽  
Peter Vary
Signals ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 434-455
Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

The inaccurate estimates of linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrades speech enhancement performance. The existing methods proposed a tuning of the biased Kalman gain particularly in stationary noise condition. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then construct a whitening filter (with its coefficients computed from the estimated noise) and employed to the noisy speech, yielding a pre-whitened speech, from where the speech LPC parameters are computed. Then construct KF with the estimated parameters, where the robustness metric offsets the bias in Kalman gain during speech absence to that of the sensitivity metric during speech presence to achieve better noise reduction. Where the noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on NOIZEUS corpus demonstrates that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


Speech enhancement has been a major challenge in the field of Signal processing. The process of filtering the noise component from the speech signal has achieved many milestones since the early 20th century. Beside many theories Linear prediction coding is one of the best methods for speech, audio signal processing which uses the algorithm of predicting the current estimates based on the past states of an LTI system. Linear prediction is usually used in Speech recognition, Speech enhancement. One of such Kalman filter was introduced and described in 1960 by Rudolf Kalman, which uses the concept of linear quadratic estimation. Kalman filtering is effectively being used in the practical applications like navigation of ships or aircraft, designing motion planning algorithms, in communication area. Kalman filters use the autoregression model of speech for the recursive equations of Kalman filter used in state space model of filter for state estimation. In this paper, we have used Kalman filter to eliminate the pink noise from the corrupted speech signal. Pink noise is very common in electronic devices and occurs in almost all devices. The Speech corrupted with pink noise has been obtained from SpEAR database. We have used MATLAB software for the simulation purpose. Finally, Spectrograms of signals are plotted for a better visual understanding of filtered results.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


Author(s):  
Michael D. Paskett ◽  
Mark R. Brinton ◽  
Taylor C. Hansen ◽  
Jacob A. George ◽  
Tyler S. Davis ◽  
...  

Abstract Background Advanced prostheses can restore function and improve quality of life for individuals with amputations. Unfortunately, most commercial control strategies do not fully utilize the rich control information from residual nerves and musculature. Continuous decoders can provide more intuitive prosthesis control using multi-channel neural or electromyographic recordings. Three components influence continuous decoder performance: the data used to train the algorithm, the algorithm, and smoothing filters on the algorithm’s output. Individual groups often focus on a single decoder, so very few studies compare different decoders using otherwise similar experimental conditions. Methods We completed a two-phase, head-to-head comparison of 12 continuous decoders using activities of daily living. In phase one, we compared two training types and a smoothing filter with three algorithms (modified Kalman filter, multi-layer perceptron, and convolutional neural network) in a clothespin relocation task. We compared training types that included only individual digit and wrist movements vs. combination movements (e.g., simultaneous grasp and wrist flexion). We also compared raw vs. nonlinearly smoothed algorithm outputs. In phase two, we compared the three algorithms in fragile egg, zipping, pouring, and folding tasks using the combination training and smoothing found beneficial in phase one. In both phases, we collected objective, performance-based (e.g., success rate), and subjective, user-focused (e.g., preference) measures. Results Phase one showed that combination training improved prosthesis control accuracy and speed, and that the nonlinear smoothing improved accuracy but generally reduced speed. Phase one importantly showed simultaneous movements were used in the task, and that the modified Kalman filter and multi-layer perceptron predicted more simultaneous movements than the convolutional neural network. In phase two, user-focused metrics favored the convolutional neural network and modified Kalman filter, whereas performance-based metrics were generally similar among all algorithms. Conclusions These results confirm that state-of-the-art algorithms, whether linear or nonlinear in nature, functionally benefit from training on more complex data and from output smoothing. These studies will be used to select a decoder for a long-term take-home trial with implanted neuromyoelectric devices. Overall, clinical considerations may favor the mKF as it is similar in performance, faster to train, and computationally less expensive than neural networks.


1991 ◽  
Vol 18 (2) ◽  
pp. 320-327 ◽  
Author(s):  
Murray A. Fitch ◽  
Edward A. McBean

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.


2016 ◽  
Vol 173 ◽  
pp. 1625-1629 ◽  
Author(s):  
Jian Pan ◽  
Xinhua Yang ◽  
Huafeng Cai ◽  
Bingxian Mu

Sign in / Sign up

Export Citation Format

Share Document