A Deep Learning-Based Kalman Filter for Speech Enhancement

DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-based Speech Enhancement

10.36227/techrxiv.14384909 ◽

2021 ◽

Author(s):

Sujan Kumar Roy ◽

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Power Spectra ◽

Previous Method ◽

Learning Approaches ◽

Convolutional Network ◽

Listening Tests ◽

Prediction Coefficient

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.

Download Full-text

DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

IEEE Access ◽

10.1109/access.2021.3075209 ◽

2021 ◽

Vol 9 ◽

pp. 64524-64538

Author(s):

Sujan Kumar Roy ◽

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Kalman Filter ◽

Speech Enhancement ◽

Single Channel ◽

Learning Approach

Download Full-text

Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement

2020 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas45731.2020.9180820 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sujan Kumar Roy ◽

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Kalman Filter ◽

Speech Enhancement ◽

Single Channel

Download Full-text

DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-based Speech Enhancement

10.36227/techrxiv.14384909.v1 ◽

2021 ◽

Author(s):

Sujan Kumar Roy ◽

Aaron Nicolson ◽

Kuldip K. Paliwal

Keyword(s):

Deep Learning ◽

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Power Spectra ◽

Previous Method ◽

Learning Approaches ◽

Convolutional Network ◽

Listening Tests ◽

Prediction Coefficient

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.

Download Full-text

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

10.21437/interspeech.2018-1020 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shuai Nie ◽

Shan Liang ◽

Bin Liu ◽

Yaping Zhang ◽

Wenju Liu ◽

...

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Speech Enhancement ◽

Learning Approach

Download Full-text

Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2021.3082318 ◽

2021 ◽

pp. 1-1

Author(s):

Ke Tan ◽

Xueliang Zhang ◽

Deliang Wang

Keyword(s):

Deep Learning ◽

Real Time ◽

Mobile Phones ◽

Speech Enhancement

Download Full-text

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽

10.3390/signals2030027 ◽

2021 ◽

Vol 2 (3) ◽

pp. 434-455

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Real Life ◽

Model Parameters ◽

Noise Variance ◽

Noisy Speech ◽

Kalman Gain ◽

Whitening Filter ◽

Prediction Coefficient

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Download Full-text

A dual Kalman filter-based smoother for speech enhancement

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). ◽

10.1109/icassp.2003.1198930 ◽

2003 ◽

Author(s):

Hong Cai ◽

E. Grivel ◽

M. Najim

Keyword(s):

Kalman Filter ◽

Speech Enhancement

Download Full-text

Speech Enhancement Using Deep Learning Methods: A Review

Jurnal Elektronika dan Telekomunikasi ◽

10.14203/jet.v21.19-26 ◽

2021 ◽

Vol 21 (1) ◽

pp. 19

Author(s):

Asri Rizki Yuliani ◽

M. Faizal Amri ◽

Endang Suryawati ◽

Ade Ramdan ◽

Hilman Ferdinandus Pardede

Keyword(s):

Neural Network ◽

Deep Learning ◽

Speech Enhancement ◽

Speech Signal ◽

Research Field ◽

Learning Technologies ◽

Learning Approaches ◽

Speech Signal Processing ◽

Generative Adversarial Network ◽

Advantages And Disadvantages

Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing. According to the type of degradation and noise in the speech signal, approaches to speech enhancement vary. Thus, the research topic remains challenging in practice, specifically when dealing with highly non-stationary noise and reverberation. Recent advance of deep learning technologies has provided great support for the progress in speech enhancement research field. Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey. In this review, we described the advantages and disadvantages of recent deep learning approaches. We also discussed challenges and trends of this field. From the reviewed works, we concluded that the trend of the deep learning architecture has shifted from the standard deep neural network (DNN) to convolutional neural network (CNN), which can efficiently learn temporal information of speech signal, and generative adversarial network (GAN), that utilize two networks training.

Download Full-text