scholarly journals Noise Modeling to Build Training Sets for Robust Speech Enhancement

Author(s):  
Yahui Wang ◽  
Wenxi Zhang ◽  
Yongbiao Wang ◽  
Xinxin Kong ◽  
Hongxin Zhang

Abstract The performance of Deep Neural Network (DNN)-based speech enhancement models degrades significantly in real recordings because the synthetic training sets are mismatched with real test sets. To solve this problem, we propose a new Generative Adversarial Network framework for Noise Modeling (NM-GAN) that can build training sets by imitating real noise distribution. The framework combines a novel U-Net with two bidirectional Long Short-Term Memory (LSTM) layers that act as a generator to construct complex noise. The Gaussian distribution is adapted and used as conditional information to direct the noise generation. A discriminator then learns to determine whether a noise sample is from the model distribution or from a real noise distribution. By adversarial and alternate training, NM-GAN can generate enough recall (diversity) and precision (quality of noise) in its samples for it to look like real noise. Afterwards, realistic-looking paired training sets are composed. Extensive experiments were carried out and qualitative and quantitative evaluation of the generated noise samples and training sets demonstrate that potential of the framework. An Speech enhancement model trained on our synthetic training sets and on real training sets was found to be capable of good noise suppression for real speech-related noise.

Author(s):  
Wenchao Du ◽  
Hu Chen ◽  
Hongyu Yang ◽  
Yi Zhang

AbstractGenerative adversarial network (GAN) has been applied for low-dose CT images to predict normal-dose CT images. However, the undesired artifacts and details bring uncertainty to the clinical diagnosis. In order to improve the visual quality while suppressing the noise, in this paper, we mainly studied the two key components of deep learning based low-dose CT (LDCT) restoration models—network architecture and adversarial loss, and proposed a disentangled noise suppression method based on GAN (DNSGAN) for LDCT. Specifically, a generator network, which contains the noise suppression and structure recovery modules, is proposed. Furthermore, a multi-scaled relativistic adversarial loss is introduced to preserve the finer structures of generated images. Experiments on simulated and real LDCT datasets show that the proposed method can effectively remove noise while recovering finer details and provide better visual perception than other state-of-the-art methods.


Neural Networks (ANN) has evolved through many stages in the last three decades with many researchers contributing in this challenging field. With the power of math complex problems can also be solved by ANNs. ANNs like Convolutional Neural Network (CNN), Deep Neural network, Generative Adversarial Network (GAN), Long Short Term Memory (LSTM) network, Recurrent Neural Network (RNN), Ordinary Differential Network etc., are playing promising roles in many MNCs and IT industries for their predictions and accuracy. In this paper, Convolutional Neural Network is used for prediction of Beep sounds in high noise levels. Based on Supervised Learning, the research is developed the best CNN architecture for Beep sound recognition in noisy situations. The proposed method gives better results with an accuracy of 96%. The prototype is tested with few architectures for the training and test data out of which a two layer CNN classifier predictions were the best.


2020 ◽  
Author(s):  
Wenjie Liu ◽  
Ying Zhang ◽  
Zhiliang Deng ◽  
Jiaojiao Zhao ◽  
Lian Tong

Abstract As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode in cloud. The purpose of stabilizing the generation process and the interaction between human and computing process is achieved by inputting conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the ”input bottleneck” of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.


2020 ◽  
Vol 28 (5) ◽  
pp. 975-988
Author(s):  
Sivamurugan Vellakani ◽  
Indumathi Pushbam

Human eye is affected by the different eye diseases including choroidal neovascularization (CNV), diabetic macular edema (DME) and age-related macular degeneration (AMD). This work aims to design an artificial intelligence (AI) based clinical decision support system for eye disease detection and classification to assist the ophthalmologists more effectively detecting and classifying CNV, DME and drusen by using the Optical Coherence Tomography (OCT) images depicting different tissues. The methodology used for designing this system involves different deep learning convolutional neural network (CNN) models and long short-term memory networks (LSTM). The best image captioning model is selected after performance analysis by comparing nine different image captioning systems for assisting ophthalmologists to detect and classify eye diseases. The quantitative data analysis results obtained for the image captioning models designed using DenseNet201 with LSTM have superior performance in terms of overall accuracy of 0.969, positive predictive value of 0.972 and true-positive rate of 0.969using OCT images enhanced by the generative adversarial network (GAN). The corresponding performance values for the Xception with LSTM image captioning models are 0.969, 0.969 and 0.938, respectively. Thus, these two models yield superior performance and have potential to assist ophthalmologists in making optimal diagnostic decision.


Mathematics ◽  
2019 ◽  
Vol 7 (10) ◽  
pp. 883 ◽  
Author(s):  
Shuyu Li ◽  
Sejun Jang ◽  
Yunsick Sung

In traditional music composition, the composer has a special knowledge of music and combines emotion and creative experience to create music. As computer technology has evolved, various music-related technologies have been developed. To create new music, a considerable amount of time is required. Therefore, a system is required that can automatically compose music from input music. This study proposes a novel melody composition method that enhanced the original generative adversarial network (GAN) model based on individual bars. Two discriminators were used to form the enhanced GAN model: one was a long short-term memory (LSTM) model that was used to ensure correlation between the bars, and the other was a convolutional neural network (CNN) model that was used to ensure rationality of the bar structure. Experiments were conducted using bar encoding and the enhanced GAN model to compose a new melody and evaluate the quality of the composition melody. In the evaluation method, the TFIDF algorithm was also used to calculate the structural differences between four types of musical instrument digital interface (MIDI) file (i.e., randomly composed melody, melody composed by the original GAN, melody composed by the proposed method, and the real melody). Using the TFIDF algorithm, the structures of the melody composed were compared by the proposed method with the real melody and the structure of the traditional melody was compared with the structure of the real melody. The experimental results showed that the melody composed by the proposed method had more similarity with real melody structure with a difference of only 8% than that of the traditional melody structure.


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Guisheng Hou ◽  
Shuo Xu ◽  
Nan Zhou ◽  
Lei Yang ◽  
Quanhao Fu

Accurate predictions of remaining useful life (RUL) of important components play a crucial role in system reliability, which is the basis of prognostics and health management (PHM). This paper proposed an integrated deep learning approach for RUL prediction of a turbofan engine by integrating an autoencoder (AE) with a deep convolutional generative adversarial network (DCGAN). In the pretraining stage, the reconstructed data of the AE not only participate in its error reconstruction but also take part in the DCGAN parameter training as the generated data of the DCGAN. Through double-error reconstructions, the capability of feature extraction is enhanced, and high-level abstract information is obtained. In the fine-tuning stage, a long short-term memory (LSTM) network is used to extract the sequential information from the features to predict the RUL. The effectiveness of the proposed scheme is verified on the NASA commercial modular aero-propulsion system simulation (C-MAPSS) dataset. The superiority of the proposed method is demonstrated via excellent prediction performance and comparisons with other existing state-of-the-art prognostics. The results of this study suggest that the proposed data-driven prognostic method offers a new and promising prediction approach and an efficient feature extraction scheme.


Sign in / Sign up

Export Citation Format

Share Document