Noise Modeling to Build Training Sets for Robust Speech Enhancement

Abstract The performance of Deep Neural Network (DNN)-based speech enhancement models degrades significantly in real recordings because the synthetic training sets are mismatched with real test sets. To solve this problem, we propose a new Generative Adversarial Network framework for Noise Modeling (NM-GAN) that can build training sets by imitating real noise distribution. The framework combines a novel U-Net with two bidirectional Long Short-Term Memory (LSTM) layers that act as a generator to construct complex noise. The Gaussian distribution is adapted and used as conditional information to direct the noise generation. A discriminator then learns to determine whether a noise sample is from the model distribution or from a real noise distribution. By adversarial and alternate training, NM-GAN can generate enough recall (diversity) and precision (quality of noise) in its samples for it to look like real noise. Afterwards, realistic-looking paired training sets are composed. Extensive experiments were carried out and qualitative and quantitative evaluation of the generated noise samples and training sets demonstrate that potential of the framework. An Speech enhancement model trained on our synthetic training sets and on real training sets was found to be capable of good noise suppression for real speech-related noise.

Download Full-text

Disentangled generative adversarial network for low-dose CT

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00749-z ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Wenchao Du ◽

Hu Chen ◽

Hongyu Yang ◽

Yi Zhang

Keyword(s):

Network Architecture ◽

Low Dose ◽

Noise Suppression ◽

State Of The Art ◽

Visual Quality ◽

Ct Images ◽

Generative Adversarial Network ◽

Low Dose Ct ◽

Adversarial Network ◽

Suppression Method

AbstractGenerative adversarial network (GAN) has been applied for low-dose CT images to predict normal-dose CT images. However, the undesired artifacts and details bring uncertainty to the clinical diagnosis. In order to improve the visual quality while suppressing the noise, in this paper, we mainly studied the two key components of deep learning based low-dose CT (LDCT) restoration models—network architecture and adversarial loss, and proposed a disentangled noise suppression method based on GAN (DNSGAN) for LDCT. Specifically, a generator network, which contains the noise suppression and structure recovery modules, is proposed. Furthermore, a multi-scaled relativistic adversarial loss is introduced to preserve the finer structures of generated images. Experiments on simulated and real LDCT datasets show that the proposed method can effectively remove noise while recovering finer details and provide better visual perception than other state-of-the-art methods.

Download Full-text

Self-Attention Generative Adversarial Network for Speech Enhancement

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414265 ◽

2021 ◽

Author(s):

Huy Phan ◽

Huy Le Nguyen ◽

Oliver Y. Chen ◽

Philipp Koch ◽

Ngoc Q. K. Duong ◽

...

Keyword(s):

Speech Enhancement ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Convolutional Neural Network Audio Classifier for Alarm Sound Detection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8866.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4554-4557

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Sound Recognition ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Differential Network ◽

Sound Detection ◽

Long Short Term Memory ◽

Lstm Network

Neural Networks (ANN) has evolved through many stages in the last three decades with many researchers contributing in this challenging field. With the power of math complex problems can also be solved by ANNs. ANNs like Convolutional Neural Network (CNN), Deep Neural network, Generative Adversarial Network (GAN), Long Short Term Memory (LSTM) network, Recurrent Neural Network (RNN), Ordinary Differential Network etc., are playing promising roles in many MNCs and IT industries for their predictions and accuracy. In this paper, Convolutional Neural Network is used for prediction of Beep sounds in high noise levels. Based on Supervised Learning, the research is developed the best CNN architecture for Beep sound recognition in noisy situations. The proposed method gives better results with an accuracy of 96%. The prototype is tested with few architectures for the training and test data out of which a two layer CNN classifier predictions were the best.

Download Full-text

A hybrid quantum-classical conditional generative adversarial network algorithm for human-centered paradigm in cloud

10.21203/rs.3.rs-69957/v1 ◽

2020 ◽

Author(s):

Wenjie Liu ◽

Ying Zhang ◽

Zhiliang Deng ◽

Jiaojiao Zhao ◽

Lian Tong

Keyword(s):

Machine Learning ◽

Quantum Circuit ◽

Machine Learning Algorithms ◽

Generation Process ◽

Generative Adversarial Network ◽

Computing Platform ◽

Adversarial Network ◽

Quantum Machine Learning ◽

Huge Impact ◽

Conditional Information

Abstract As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode in cloud. The purpose of stabilizing the generation process and the interaction between human and computing process is achieved by inputting conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the ”input bottleneck” of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.

Download Full-text

Relative Attributes-based Generative Adversarial Network for Desert Seismic Noise Suppression

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2021.3135034 ◽

2021 ◽

pp. 1-1

Author(s):

Haitao Ma ◽

Yu Sun ◽

Ning Wu ◽

Yue Li

Keyword(s):

Seismic Noise ◽

Noise Suppression ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

An enhanced OCT image captioning system to assist ophthalmologists in detecting and classifying eye diseases

Journal of X-Ray Science and Technology ◽

10.3233/xst-200697 ◽

2020 ◽

Vol 28 (5) ◽

pp. 975-988

Author(s):

Sivamurugan Vellakani ◽

Indumathi Pushbam

Keyword(s):

Short Term Memory ◽

Clinical Decision ◽

True Positive Rate ◽

Eye Diseases ◽

Age Related Macular Degeneration ◽

Superior Performance ◽

Image Captioning ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Age Related

Human eye is affected by the different eye diseases including choroidal neovascularization (CNV), diabetic macular edema (DME) and age-related macular degeneration (AMD). This work aims to design an artificial intelligence (AI) based clinical decision support system for eye disease detection and classification to assist the ophthalmologists more effectively detecting and classifying CNV, DME and drusen by using the Optical Coherence Tomography (OCT) images depicting different tissues. The methodology used for designing this system involves different deep learning convolutional neural network (CNN) models and long short-term memory networks (LSTM). The best image captioning model is selected after performance analysis by comparing nine different image captioning systems for assisting ophthalmologists to detect and classify eye diseases. The quantitative data analysis results obtained for the image captioning models designed using DenseNet201 with LSTM have superior performance in terms of overall accuracy of 0.969, positive predictive value of 0.972 and true-positive rate of 0.969using OCT images enhanced by the generative adversarial network (GAN). The corresponding performance values for the Xception with LSTM image captioning models are 0.969, 0.969 and 0.938, respectively. Thus, these two models yield superior performance and have potential to assist ophthalmologists in making optimal diagnostic decision.

Download Full-text

Automatic Melody Composition Using Enhanced GAN

Mathematics ◽

10.3390/math7100883 ◽

2019 ◽

Vol 7 (10) ◽

pp. 883 ◽

Cited By ~ 1

Author(s):

Shuyu Li ◽

Sejun Jang ◽

Yunsick Sung

Keyword(s):

Evaluation Method ◽

Short Term Memory ◽

Traditional Music ◽

Musical Instrument ◽

Generative Adversarial Network ◽

Digital Interface ◽

The Real ◽

Adversarial Network ◽

Creative Experience ◽

Special Knowledge

In traditional music composition, the composer has a special knowledge of music and combines emotion and creative experience to create music. As computer technology has evolved, various music-related technologies have been developed. To create new music, a considerable amount of time is required. Therefore, a system is required that can automatically compose music from input music. This study proposes a novel melody composition method that enhanced the original generative adversarial network (GAN) model based on individual bars. Two discriminators were used to form the enhanced GAN model: one was a long short-term memory (LSTM) model that was used to ensure correlation between the bars, and the other was a convolutional neural network (CNN) model that was used to ensure rationality of the bar structure. Experiments were conducted using bar encoding and the enhanced GAN model to compose a new melody and evaluate the quality of the composition melody. In the evaluation method, the TFIDF algorithm was also used to calculate the structural differences between four types of musical instrument digital interface (MIDI) file (i.e., randomly composed melody, melody composed by the original GAN, melody composed by the proposed method, and the real melody). Using the TFIDF algorithm, the structures of the melody composed were compared by the proposed method with the real melody and the structure of the traditional melody was compared with the structure of the real melody. The experimental results showed that the melody composed by the proposed method had more similarity with real melody structure with a difference of only 8% than that of the traditional melody structure.

Download Full-text

A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipaasc47483.2019.9023273 ◽

2019 ◽

Author(s):

Jie Cao ◽

Yaofeng Zhou ◽

Hong Yu ◽

Xiaoxu Li ◽

Dan Wang ◽

...

Keyword(s):

Speech Enhancement ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Remaining Useful Life Estimation Using Deep Convolutional Generative Adversarial Networks Based on an Autoencoder Scheme

Computational Intelligence and Neuroscience ◽

10.1155/2020/9601389 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Guisheng Hou ◽

Shuo Xu ◽

Nan Zhou ◽

Lei Yang ◽

Quanhao Fu

Keyword(s):

Feature Extraction ◽

Short Term Memory ◽

Health Management ◽

Remaining Useful Life ◽

Fine Tuning ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Useful Life ◽

Prediction Approach

Accurate predictions of remaining useful life (RUL) of important components play a crucial role in system reliability, which is the basis of prognostics and health management (PHM). This paper proposed an integrated deep learning approach for RUL prediction of a turbofan engine by integrating an autoencoder (AE) with a deep convolutional generative adversarial network (DCGAN). In the pretraining stage, the reconstructed data of the AE not only participate in its error reconstruction but also take part in the DCGAN parameter training as the generated data of the DCGAN. Through double-error reconstructions, the capability of feature extraction is enhanced, and high-level abstract information is obtained. In the fine-tuning stage, a long short-term memory (LSTM) network is used to extract the sequential information from the features to predict the RUL. The effectiveness of the proposed scheme is verified on the NASA commercial modular aero-propulsion system simulation (C-MAPSS) dataset. The superiority of the proposed method is demonstrated via excellent prediction performance and comparisons with other existing state-of-the-art prognostics. The results of this study suggest that the proposed data-driven prognostic method offers a new and promising prediction approach and an efficient feature extraction scheme.

Download Full-text