noisy speech Latest Research Papers

The Use of Speech Recognition Systems to Select a Useful Signal in Noisy Speech at a Low Signal-To-Noise Ratio

10.1109/dynamics52735.2021.9653711 ◽

2021 ◽

Author(s):

Sh. R. Salimov ◽

N. A. Volkov ◽

A. V. Ivanov

Keyword(s):

Speech Recognition ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Noisy Speech ◽

Useful Signal ◽

Recognition Systems ◽

Noise Ratio

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00813-8 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Youming Wang ◽

Jiali Han ◽

Tianqi Zhang ◽

Didi Qing

Keyword(s):

Neural Network ◽

Deep Learning ◽

Power Spectrum ◽

Speech Enhancement ◽

Deep Neural Network ◽

Series Data ◽

Context Information ◽

Noisy Speech ◽

Enhancement Method ◽

Gated Recurrent Unit

AbstractSpeech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Entropy-Based Extraction of Useful Content from Spectrograms of Noisy Speech Signals

10.1109/elmar52657.2021.9550891 ◽

2021 ◽

Author(s):

Ana Vrankovic ◽

Ivo Ipsic ◽

Jonatan Lerga

Keyword(s):

Speech Signals ◽

Noisy Speech

Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

10.23919/eusipco54536.2021.9616204 ◽

2021 ◽

Author(s):

Nils Poschadel ◽

Robert Hupke ◽

Stephan Preihs ◽

Jurgen Peissig

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Direction Of Arrival ◽

Higher Order ◽

Direction Of Arrival Estimation ◽

Noisy Speech

Multi-Source Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

10.23919/eusipco54536.2021.9616002 ◽

2021 ◽

Author(s):

Nils Poschadel ◽

Stephan Preihs ◽

Jurgen Peissig

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Direction Of Arrival ◽

Higher Order ◽

Direction Of Arrival Estimation ◽

Noisy Speech

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00220-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Speech Enhancement ◽

Minimum Mean Square Error ◽

Estimation Algorithm ◽

Pass Filter ◽

Noisy Speech ◽

Stationary Noise ◽

Speech Activity ◽

Wide Range ◽

Psd Estimation ◽

High Pass

AbstractThe minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than the speech signal— which lacks the ability to track the highly non-stationary noise sources. Moreover, the performance of the MMSE-based noise PSD estimator largely depends upon the accuracy of the a priori SNR estimation in practice. In this paper, we introduce a noise PSD estimation algorithm using a derivative-based high-pass filter in non-stationary noise conditions. The proposed method processes the silent and speech frames of the noisy speech differently to estimate the noise PSD. It is due to the non-stationary noise that can be mixed with silent and speech-dominated frames non-uniformly. We first introduce a spectral-flatness-based adaptive thresholding technique to detect the speech activity of the noisy speech frames. Since the silent frame of the noisy speech is completely filled with noise, the noise periodogram is directly computed from it without applying any filtering. Conversely, a 4th order derivative-based high-pass filter is applied during speech activity of the noisy speech frame to filter out the clean speech components while leaving behind mostly the noise. The noise periodogram is computed from the filtered signal—which counteracts the leaking of clean speech power. The noise PSD estimate is obtained by recursively averaging the previously estimated noise PSD and the current estimate of the noise periodogram. The proposed method is found to be effective in tracking the rapidly changing as well as the slowly varying noise PSD than the competing methods in non-stationary noise conditions for a wide range of signal-to-noise ratio (SNR) levels. Extensive objective and subjective scores on the NOIZEUS corpus demonstrate that the application of the proposed noise PSD with MMSE-based speech enhancement methods produce higher quality and intelligible enhanced speech than the competing methods.

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽

10.3390/signals2030027 ◽

2021 ◽

Vol 2 (3) ◽

pp. 434-455

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Real Life ◽

Model Parameters ◽

Noise Variance ◽

Noisy Speech ◽

Kalman Gain ◽

Whitening Filter ◽

Prediction Coefficient

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Speech Enhancement from Fused Features Based on Deep Neural Network and Gated Recurrent Unit Network

10.21203/rs.3.rs-554205/v1 ◽

2021 ◽

Author(s):

Youming Wang ◽

Jiali Han ◽

Tianqi Zhang ◽

Didi Qing

Keyword(s):

Neural Network ◽

Power Spectrum ◽

Speech Enhancement ◽

Deep Neural Network ◽

Series Data ◽

Noisy Speech ◽

Deep Model ◽

Gated Recurrent Unit ◽

Unit Network

Abstract Speech is easily interfered by the external environment in reality, which will lose the important features. Deep learning method has become the mainstream method of speech enhancement because of its superior potential in complex nonlinear mapping problems. However, there are some problems are exist such as the deficiency for the learning the important information from previous time steps and long-term event dependencies. Due to the lack of the correlation in the same layer of Deep Neural Networks (DNNs), which is an existing typical intelligent deep model of speech signal, it is difficult to capture the long-term dependence between the time-series data. To overcome this problem, we propose a novel speech enhancement method from fused features based on deep neural network and gated recurrent unit network. The method takes advantage of both deep neural network and recurrent neural network to reduce the number of parameters and simultaneously improve speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of gated recurrent unit (GRU) network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53% respectively compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36% respectively. The advantage of the proposed method is that it uses of the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413395 ◽

2021 ◽

Author(s):

Zhaoxu Nian ◽

Yan-Hui Tu ◽

Jun Du ◽

Chin-Hui Lee

Keyword(s):

Speech Recognition ◽

Speech Enhancement ◽

Learning Approach ◽

Noisy Speech ◽

Noisy Speech Recognition ◽

Adaptive Noise ◽

Progressive Learning

Source-Aware Neural Speech Coding for Noisy Speech Compression

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413678 ◽

2021 ◽

Author(s):

Haici Yang ◽

Kai Zhen ◽

Seungkwon Beack ◽

Minje Kim

Keyword(s):

Speech Coding ◽

Speech Compression ◽

Noisy Speech

noisy speech
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Use of Speech Recognition Systems to Select a Useful Signal in Noisy Speech at a Low Signal-To-Noise Ratio

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Entropy-Based Extraction of Useful Content from Spectrograms of Noisy Speech Signals

Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

Multi-Source Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Speech Enhancement from Fused Features Based on Deep Neural Network and Gated Recurrent Unit Network

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition

Source-Aware Neural Speech Coding for Noisy Speech Compression

Export Citation Format

noisy speechRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Use of Speech Recognition Systems to Select a Useful Signal in Noisy Speech at a Low Signal-To-Noise Ratio

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Entropy-Based Extraction of Useful Content from Spectrograms of Noisy Speech Signals

Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

Multi-Source Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Speech Enhancement from Fused Features Based on Deep Neural Network and Gated Recurrent Unit Network

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition

Source-Aware Neural Speech Coding for Noisy Speech Compression

noisy speech
Recently Published Documents