scholarly journals Class Imbalanced Fault Diagnosis via Combining K-Means Clustering Algorithm with Generative Adversarial Networks

Author(s):  
Huifang Li ◽  
◽  
Rui Fan ◽  
Qisong Shi ◽  
Zijian Du

Recent advancements in machine learning and communication technologies have enabled new approaches to automated fault diagnosis and detection in industrial systems. Given wide variation in occurrence frequencies of different classes of faults, the class distribution of real-world industrial fault data is usually imbalanced. However, most prior machine learning-based classification methods do not take this imbalance into consideration, and thus tend to be biased toward recognizing the majority classes and result in poor accuracy for minority ones. To solve such problems, we propose a k-means clustering generative adversarial network (KM-GAN)-based fault diagnosis approach able to reduce imbalance in fault data and improve diagnostic accuracy for minority classes. First, we design a new k-means clustering algorithm and GAN-based oversampling method to generate diverse minority-class samples obeying the similar distribution to the original minority data. The k-means clustering algorithm is adopted to divide minority-class samples into k clusters, while a GAN is applied to learn the data distribution of the resulting clusters and generate a given number of minority-class samples as a supplement to the original dataset. Then, we construct a deep neural network (DNN) and deep belief network (DBN)-based heterogeneous ensemble model as a fault classifier to improve generalization, in which DNN and DBN models are trained separately on the resulting dataset, and then the outputs from both are averaged as the final diagnostic result. A series of comparative experiments are conducted to verify the effectiveness of our proposed method, and the experimental results show that our method can improve diagnostic accuracy for minority-class samples.

Author(s):  
Derek Reiman ◽  
Yang Dai

AbstractThe microbiome of the human body has been shown to have profound effects on physiological regulation and disease pathogenesis. However, association analysis based on statistical modeling of microbiome data has continued to be a challenge due to inherent noise, complexity of the data, and high cost of collecting large number of samples. To address this challenge, we employed a deep learning framework to construct a data-driven simulation of microbiome data using a conditional generative adversarial network. Conditional generative adversarial networks train two models against each other while leveraging side information learn from a given dataset to compute larger simulated datasets that are representative of the original dataset. In our study, we used a cohorts of patients with inflammatory bowel disease to show that not only can the generative adversarial network generate samples representative of the original data based on multiple diversity metrics, but also that training machine learning models on the synthetic samples can improve disease prediction through data augmentation. In addition, we also show that the synthetic samples generated by this cohort can boost disease prediction of a different external cohort.


2017 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


Author(s):  
Khaled ELKarazle ◽  
Valliappan Raman ◽  
Patrick Then

Age estimation models can be employed in many applications, including soft biometrics, content access control, targeted advertising, and many more. However, as some facial images are taken in unrestrained conditions, the quality relegates, which results in the loss of several essential ageing features. This study investigates how introducing a new layer of data processing based on a super-resolution generative adversarial network (SRGAN) model can influence the accuracy of age estimation by enhancing the quality of both the training and testing samples. Additionally, we introduce a novel convolutional neural network (CNN) classifier to distinguish between several age classes. We train one of our classifiers on a reconstructed version of the original dataset and compare its performance with an identical classifier trained on the original version of the same dataset. Our findings reveal that the classifier which trains on the reconstructed dataset produces better classification accuracy, opening the door for more research into building data-centric machine learning systems.


2020 ◽  
Author(s):  
Belén Vega-Márquez ◽  
Cristina Rubio-Escudero ◽  
Isabel Nepomuceno-Chamorro

Abstract The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to the new protection data laws that are emerging. Because of the rise in the use of Artificial Intelligence, one of the most recent proposals to address this problem is the use of Generative Adversarial Networks (GANs). These types of networks have demonstrated a great capacity to create synthetic data with very good performance. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, it is treated as any other attribute. This research work has focused on the creation of new synthetic data from datasets with different characteristics with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured in two different ways: firstly, by comparing the results obtained with classification algorithms, both in the original datasets and in the data generated; secondly, by checking that the correlation between the original data and those generated is minimal.


Energies ◽  
2019 ◽  
Vol 12 (3) ◽  
pp. 527 ◽  
Author(s):  
Chaowen Zhong ◽  
Ke Yan ◽  
Yuting Dai ◽  
Ning Jin ◽  
Bing Lou

Automated fault diagnosis (AFD) for various energy consumption components is one of the main topics for energy efficiency solutions. However, the lack of faulty samples in the training process remains as a difficulty for data-driven AFD of heating, ventilation and air conditioning (HVAC) subsystems, such as air handling units (AHU). Existing works show that semi-supervised learning theories can effectively alleviate the issue by iteratively inserting newly tested faulty data samples into the training pool when the same fault happens again. However, a research gap exists between theoretical AFD algorithms and real-world applications. First, for real-world AFD applications, it is hard to predict the time when the same fault happens again. Second, the training set is required to be pre-defined and fixed before being packed into the building management system (BMS) for automatic HVAC fault diagnosis. The semi-supervised learning process of iteratively absorbing testing data into the training pool can be irrelevant for industrial usage of the AFD methods. Generative adversarial network (GAN) is well-known as an unsupervised learning technique to enrich the training pool with fake samples that are close to real faulty samples. In this study, a hybrid generative adversarial network (GAN) is proposed combining Wasserstein GAN with traditional classifiers to perform fault diagnosis mimicking the real-world scenarios with limited faulty training samples in the training process. Experimental results on real-world datasets demonstrate the effectiveness of the proposed approach for fault diagnosis problems of AHU subsystem.


Author(s):  
R Wisnu Prio Pamungkas ◽  
Rakhmi Khalida ◽  
Siti Setiawati

ABSTRACT   Recently computers have been able to produce realistic photos from text. This is one of the potentials of machine learning to be used creatively. Machine learning is the field of solving problems that require an equivalent understanding of human intelligence. In this study using the Generative Adversarial Networks (GAN) algorithm is used to create images from text descriptions. The basic GAN architecture consists of 2 networks called a Generator and Discriminator network. The results of this study is images that are still not detailed in interpreting a text description, but the authors try to produce images that inspire, images can be more poetic when tried using poetry, lyrics, or book quotes. Keywords: GAN, Image Synthesis, Text Description   ABSTRAK   Baru-baru ini komputer mampu menghasilkan foto-foto yang realistis dari sebuah teks. Hal ini adalah salah satu potensi dari machine learning untuk digunakan secara kreatif. Machine learning adalah bidang menyelesaikan masalah-masalah yang membutuhkan pemahaman yang setara dengan kecerdasan manusia. Pada penelitian ini menggunakan algoritme Generative Adversarial Networks (GAN) digunakan untuk menciptakan gambar dari deskripsi teks. Dasar arsitektur GAN terdiri dari 2 jaringan yang disebut sebagai jaringan Generator dan Discriminator. Hasil dari penelitian ini berupa gambar yang masih tidak detail dalam memaknai sebuah deskripsi teks, tetapi penulis mencoba menghasilkan gambar yang menginspirasi, gambar dapat lebih puitis ketika dicoba menggunakan puisi, lirik, atau kutipan buku. Kata Kunci: GAN, Sintesis Gambar, Deskripsi Teks


2021 ◽  
Author(s):  
Shucong Liu ◽  
Hongjun Wang ◽  
Fengxia Han ◽  
Xiang Zhang

Abstract In gas turbine rotor system fault diagnosis intelligent method based on data-driven is an important means to monitor the health status of gas turbine, it is necessary to obtain sufficient effective fault data to train the intelligent diagnosis model. In the actual operation of gas turbine, the collected gas turbine fault data is limited, and the small and imbalanced fault samples seriously affect the accuracy of fault diagnosis method. Aiming at the imbalance of gas turbine fault data, an Improved Deep Convolutional Generative Adversarial Network (Improved DCGAN) suitable for gas turbine signal is proposed, an structural optimization on generator of Deep Convolutional Generative Adversarial Network (DCGAN) and gradient penalty improvement on the loss function are introduced to generate effective fault data and improve the classification accuracy. The experiment results of gas turbine test bench demonstrated that the proposed method generated effective fault samples as a supplementary set of fault samples to balance the dataset, effectively improved the fault classification and diagnosis performance of gas turbine rotor in the case of small samples, The proposed method can be used as a solution to the problems of small unbalanced fault samples, and provides an effective method for gas turbine fault diagnosis.


Author(s):  
Jinrui Wang ◽  
Baokun Han ◽  
Huaiqian Bao ◽  
Mingyan Wang ◽  
Zhenyun Chu ◽  
...  

As a useful data augmentation technique, generative adversarial networks have been successfully applied in fault diagnosis field. But traditional generative adversarial networks can only generate one category fault signals in one time, which is time-consuming and costly. To overcome this weakness, we develop a novel fault diagnosis method which combines conditional generative adversarial networks and stacked autoencoders, and both of them are built by stacking one-dimensional full connection layers. First, conditional generative adversarial networks is used to generate artificial samples based on the frequency samples, and category labels are adopted as the conditional information to simultaneously generate different category signals. Meanwhile, spectrum normalization is added to the discriminator of conditional generative adversarial networks to enhance the model training. Then, the augmented training samples are transferred to stacked autoencoders for feature extraction and fault classification. Finally, two datasets of bearing and gearbox are employed to investigate the effectiveness of the proposed conditional generative adversarial network–stacked autoencoder method.


2021 ◽  
Vol 11 (22) ◽  
pp. 10823
Author(s):  
Der-Chiang Li ◽  
Szu-Chou Chen ◽  
Yao-San Lin ◽  
Kuan-Cheng Huang

In recent years, generative adversarial networks (GANs) have been proposed to generate simulated images, and some works of literature have applied GAN to the analysis of numerical data in many fields, such as the prediction of building energy consumption and the prediction and identification of liver cancer stages. However, these studies are based on sufficient data volume. In the current era of globalization, the demand for rapid decision-making is increasing, but the data available in a short period of time is scarce. As a result, machine learning may not provide precise results. Obtaining more information from a small number of samples has become an important issue. Therefore, this study aimed to modify the generative adversarial network structure for learning with small numerical datasets, starting with the Wasserstein GAN (WGAN) as the GAN architecture, and using mega-trend-diffusion (MTD) to limit the bound of virtual samples that the GAN generates. The model verification of our proposed structure was conducted with two datasets in the UC Irvine Machine Learning Repository, and the performance was evaluated using three criteria: accuracy, standard deviation, and p-value. The experiment result shows that, using this improved GAN architecture (WGAN_MTD), small sample data can also be used to generate virtual samples that are similar to real samples through GAN.


Sign in / Sign up

Export Citation Format

Share Document