Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network

Author(s):  
Meet H. Soni ◽  
Neil Shah ◽  
Hemant A. Patil
2021 ◽  
Vol 14 (1) ◽  
pp. 123
Author(s):  
Xin Yao ◽  
Xiaoran Shi ◽  
Yaxin Li ◽  
Li Wang ◽  
Han Wang ◽  
...  

In the field of target classification, detecting a ground moving target that is easily covered in clutter has been a challenge. In addition, traditional feature extraction techniques and classification methods usually rely on strong subjective factors and prior knowledge, which affect their generalization capacity. Most existing deep-learning-based methods suffer from insufficient feature learning due to the lack of data samples, which makes it difficult for the training process to converge to a steady-state. To overcome these limitations, this paper proposes a Wasserstein generative adversarial network (WGAN) sample enhancement method for ground moving target classification (GMT-WGAN). First, the micro-Doppler characteristics of ground moving targets are analyzed. Next, a WGAN is constructed to generate effective time–frequency images of ground moving targets and thereby enrich the sample database used to train the classification network. Then, image quality evaluation indexes are introduced to evaluate the generated spectrogram samples, with an aim to verify the distribution similarity of generated and real samples. Afterward, by feeding augmented samples to the deep convolutional neural networks with good generalization capacity, the classification performance of the GMT-WGAN is improved. Finally, experiments conducted on different datasets validate the effectiveness and robustness of the proposed method.


2021 ◽  
Vol 263 (3) ◽  
pp. 3643-3648
Author(s):  
Gyuwon Kim ◽  
Seungchul Lee

Detecting bearing faults in advance is critical for mechanical and electrical systems to prevent economic loss and safety hazards. As part of the recent interest in artificial intelligence, deep learning (DL)-based principles have gained much attention in intelligent fault diagnostics and have mainly been developed in a supervised manner. While these works have shown promising results, several technical setbacks are inherent in a supervised learning setting. Data imbalance is a critical problem as faulty data is scarce in many cases, data labeling is tedious, and unseen cases of faults cannot be detected in a supervised framework. Herein, a generative adversarial network (GAN) is proposed to achieve unsupervised bearing fault diagnostics by utilizing only the normal data. The proposed method first adopts the short-time Fourier transform (STFT) to convert the 1-D vibration signals into 2-D time-frequency representations to use as the input to our (DL) framework. Subsequently, a GAN-based latent mapping is constructed using only the normal data, and faulty signals are detected using an anomaly metric comprised of a discriminator error and an image reconstruction error. The performance of our method is verified using a classic rotating machinery dataset (Case Western Reserve bearing dataset), and the experimental results demonstrate that our method can not only detect the faults but can also cluster the faults in the latent space with high accuracy.


2019 ◽  
Vol 9 (16) ◽  
pp. 3396 ◽  
Author(s):  
Jianfeng Wu ◽  
Yongzhu Hua ◽  
Shengying Yang ◽  
Hongshuai Qin ◽  
Huibin Qin

This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. Unlike the other DNN-based methods, which usually train many different models on the same data and then average their predictions, or use a large number of noise types to enlarge the simulated noisy speech, the proposed method does not train a whole ensemble of models and does not require a mass of simulated noisy speech. It first trains a discriminator network and a generator network simultaneously using the adversarial learning method. Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network. Finally, the generator network is fine-tuned using real noisy speech. Experiments on CHiME4 data sets demonstrate that the proposed method achieves a more robust performance than the compared DNN-based method in terms of perceptual speech quality.


Sign in / Sign up

Export Citation Format

Share Document