Speech enhancement methods based on binaural cue coding

AbstractAccording to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.

Download Full-text

Environmental Attention-Guided Branchy Neural Network for Speech Enhancement

Applied Sciences ◽

10.3390/app10031167 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1167 ◽

Cited By ~ 1

Author(s):

Lu Zhang ◽

Mingjiang Wang ◽

Qiquan Zhang ◽

Ming Liu

Keyword(s):

Neural Network ◽

Noise Reduction ◽

Speech Enhancement ◽

Deep Neural Network ◽

Experimental Results ◽

Noisy Environments ◽

Neural Structure ◽

Noise Interference ◽

Noise Type ◽

Speech Reconstruction

The performance of speech enhancement algorithms can be further improved by considering the application scenarios of speech products. In this paper, we propose an attention-based branchy neural network framework by incorporating the prior environmental information for noise reduction. In the whole denoising framework, first, an environment classification network is trained to distinguish the noise type of each noisy speech frame. Guided by this classification network, the denoising network gradually learns respective noise reduction abilities in different branches. Unlike most deep neural network (DNN)-based methods, which learn speech reconstruction capabilities with a common neural structure from all training noises, the proposed branchy model obtains greater performance benefits from the specially trained branches of prior known noise interference types. Experimental results show that the proposed branchy DNN model not only preserved better enhanced speech quality and intelligibility in seen noisy environments, but also obtained good generalization in unseen noisy environments.

Download Full-text

Speech Enhancement from Fused Features Based on Deep Neural Network and Gated Recurrent Unit Network

10.21203/rs.3.rs-554205/v1 ◽

2021 ◽

Author(s):

Youming Wang ◽

Jiali Han ◽

Tianqi Zhang ◽

Didi Qing

Keyword(s):

Neural Network ◽

Power Spectrum ◽

Speech Enhancement ◽

Deep Neural Network ◽

Series Data ◽

Noisy Speech ◽

Deep Model ◽

Gated Recurrent Unit ◽

Unit Network

Abstract Speech is easily interfered by the external environment in reality, which will lose the important features. Deep learning method has become the mainstream method of speech enhancement because of its superior potential in complex nonlinear mapping problems. However, there are some problems are exist such as the deficiency for the learning the important information from previous time steps and long-term event dependencies. Due to the lack of the correlation in the same layer of Deep Neural Networks (DNNs), which is an existing typical intelligent deep model of speech signal, it is difficult to capture the long-term dependence between the time-series data. To overcome this problem, we propose a novel speech enhancement method from fused features based on deep neural network and gated recurrent unit network. The method takes advantage of both deep neural network and recurrent neural network to reduce the number of parameters and simultaneously improve speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of gated recurrent unit (GRU) network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53% respectively compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36% respectively. The advantage of the proposed method is that it uses of the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Download Full-text

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00813-8 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Youming Wang ◽

Jiali Han ◽

Tianqi Zhang ◽

Didi Qing

Keyword(s):

Neural Network ◽

Deep Learning ◽

Power Spectrum ◽

Speech Enhancement ◽

Deep Neural Network ◽

Series Data ◽

Context Information ◽

Noisy Speech ◽

Enhancement Method ◽

Gated Recurrent Unit

AbstractSpeech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Download Full-text

Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement

10.21437/interspeech.2018-1439 ◽

2018 ◽

Author(s):

Li Chai ◽

Jun Du ◽

Chin-Hui Lee

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Single Channel ◽

Laplace Distribution ◽

Error Modeling ◽

Asymmetric Laplace Distribution

Download Full-text

Time—Frequency Mask Estimation based on Deep Neural Network for Flexible Load Disaggregation in Buildings

IEEE Transactions on Smart Grid ◽

10.1109/tsg.2021.3066547 ◽

2021 ◽

pp. 1-1

Author(s):

Junho Song ◽

Yonggu Lee ◽

Euiseok Hwang

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Time Frequency ◽

Load Disaggregation ◽

Flexible Load ◽

Mask Estimation

Download Full-text

Speech Enhancement for Punjabi Language Using Deep Neural Network

2019 International Conference on Signal Processing and Communication (ICSC) ◽

10.1109/icsc45622.2019.8938309 ◽

2019 ◽

Author(s):

Jaspreet Singh ◽

Kamaldeep Kaur

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network

Download Full-text

A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e99.a.835 ◽

2016 ◽

Vol E99.A (4) ◽

pp. 835-838 ◽

Cited By ~ 2

Author(s):

Wei HAN ◽

Xiongwei ZHANG ◽

Gang MIN ◽

Meng SUN

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network

Download Full-text

NOISE-ADAPTIVE DEEP NEURAL NETWORK FOR SINGLE-CHANNEL SPEECH ENHANCEMENT

2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) ◽

10.1109/mlsp.2018.8517027 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hanwook Chung ◽

Taesup Kim ◽

Eric Plourde ◽

Benoit Champagne

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Single Channel

Download Full-text

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0130 ◽

2021 ◽

Vol 25 (1) ◽

pp. 130-137

Author(s):

Wenlong Li ◽

◽

Kaoru Hirota ◽

Yaping Dai ◽

Zhiyang Jia

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Voice Conversion ◽

Post Processing ◽

Generalization Capability ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Subjective Score ◽

Model Training

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.

Download Full-text

Speech Enhancement Using Deep Learning Methods: A Review

Jurnal Elektronika dan Telekomunikasi ◽

10.14203/jet.v21.19-26 ◽

2021 ◽

Vol 21 (1) ◽

pp. 19

Author(s):

Asri Rizki Yuliani ◽

M. Faizal Amri ◽

Endang Suryawati ◽

Ade Ramdan ◽

Hilman Ferdinandus Pardede

Keyword(s):

Neural Network ◽

Deep Learning ◽

Speech Enhancement ◽

Speech Signal ◽

Research Field ◽

Learning Technologies ◽

Learning Approaches ◽

Speech Signal Processing ◽

Generative Adversarial Network ◽

Advantages And Disadvantages

Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing. According to the type of degradation and noise in the speech signal, approaches to speech enhancement vary. Thus, the research topic remains challenging in practice, specifically when dealing with highly non-stationary noise and reverberation. Recent advance of deep learning technologies has provided great support for the progress in speech enhancement research field. Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey. In this review, we described the advantages and disadvantages of recent deep learning approaches. We also discussed challenges and trends of this field. From the reviewed works, we concluded that the trend of the deep learning architecture has shifted from the standard deep neural network (DNN) to convolutional neural network (CNN), which can efficiently learn temporal information of speech signal, and generative adversarial network (GAN), that utilize two networks training.

Download Full-text