scholarly journals Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 434-455
Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

The inaccurate estimates of linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrades speech enhancement performance. The existing methods proposed a tuning of the biased Kalman gain particularly in stationary noise condition. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then construct a whitening filter (with its coefficients computed from the estimated noise) and employed to the noisy speech, yielding a pre-whitened speech, from where the speech LPC parameters are computed. Then construct KF with the estimated parameters, where the robustness metric offsets the bias in Kalman gain during speech absence to that of the sensitivity metric during speech presence to achieve better noise reduction. Where the noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on NOIZEUS corpus demonstrates that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66\% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR, over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

The performance of speech coding, speech recognition, and speech enhancement largely depends upon the accuracy of the linear prediction coefficient (LPC) of clean speech and noise in practice. Formulation of speech and noise LPC estimation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised technique, typically a deep neural network (DNN) is trained to learn a mapping from noisy speech features to clean speech and noise LPCs. Training targets for DNN to clean speech and noise LPC estimation fall into four categories: line spectrum frequency (LSF), LPC power spectrum (LPC-PS), power spectrum (PS), and magnitude spectrum (MS). The choice of appropriate training target as well as the DNN method can have a significant impact on LPC estimation in practice. Motivated by this, we perform a comprehensive study on the training targets using two state-of-the-art DNN methods--- residual network and temporal convolutional network (ResNet-TCN) and multi-head attention network (MHANet). This study aims to determine which training target as well as DNN method produces more accurate LPCs in practice. We train the ResNet-TCN and MHANet for each training target with a large data set. Experiments on the NOIZEUS corpus demonstrate that the LPC-PS training target with MHANet produces a lower spectral distortion (SD) level in the estimated speech LPCs in real-life noise conditions. We also construct the AKF with the estimated speech and noise LPC parameters from each training target using ResNet-TCN and MHANet. Subjective AB listening tests and seven different objective quality and intelligibility evaluation measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR) on the NOIZEUS corpus demonstrate that the AKF constructed with MHANet-LPC-PS driven speech and noise LPC parameters produced enhanced speech with higher quality and intelligibility than competing methods.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

The performance of speech coding, speech recognition, and speech enhancement largely depends upon the accuracy of the linear prediction coefficient (LPC) of clean speech and noise in practice. Formulation of speech and noise LPC estimation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised technique, typically a deep neural network (DNN) is trained to learn a mapping from noisy speech features to clean speech and noise LPCs. Training targets for DNN to clean speech and noise LPC estimation fall into four categories: line spectrum frequency (LSF), LPC power spectrum (LPC-PS), power spectrum (PS), and magnitude spectrum (MS). The choice of appropriate training target as well as the DNN method can have a significant impact on LPC estimation in practice. Motivated by this, we perform a comprehensive study on the training targets using two state-of-the-art DNN methods--- residual network and temporal convolutional network (ResNet-TCN) and multi-head attention network (MHANet). This study aims to determine which training target as well as DNN method produces more accurate LPCs in practice. We train the ResNet-TCN and MHANet for each training target with a large data set. Experiments on the NOIZEUS corpus demonstrate that the LPC-PS training target with MHANet produces a lower spectral distortion (SD) level in the estimated speech LPCs in real-life noise conditions. We also construct the AKF with the estimated speech and noise LPC parameters from each training target using ResNet-TCN and MHANet. Subjective AB listening tests and seven different objective quality and intelligibility evaluation measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR) on the NOIZEUS corpus demonstrate that the AKF constructed with MHANet-LPC-PS driven speech and noise LPC parameters produced enhanced speech with higher quality and intelligibility than competing methods.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66\% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR, over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.


1991 ◽  
Vol 18 (2) ◽  
pp. 320-327 ◽  
Author(s):  
Murray A. Fitch ◽  
Edward A. McBean

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.


Speech enhancement has been a major challenge in the field of Signal processing. The process of filtering the noise component from the speech signal has achieved many milestones since the early 20th century. Beside many theories Linear prediction coding is one of the best methods for speech, audio signal processing which uses the algorithm of predicting the current estimates based on the past states of an LTI system. Linear prediction is usually used in Speech recognition, Speech enhancement. One of such Kalman filter was introduced and described in 1960 by Rudolf Kalman, which uses the concept of linear quadratic estimation. Kalman filtering is effectively being used in the practical applications like navigation of ships or aircraft, designing motion planning algorithms, in communication area. Kalman filters use the autoregression model of speech for the recursive equations of Kalman filter used in state space model of filter for state estimation. In this paper, we have used Kalman filter to eliminate the pink noise from the corrupted speech signal. Pink noise is very common in electronic devices and occurs in almost all devices. The Speech corrupted with pink noise has been obtained from SpEAR database. We have used MATLAB software for the simulation purpose. Finally, Spectrograms of signals are plotted for a better visual understanding of filtered results.


Sign in / Sign up

Export Citation Format

Share Document