GANs and VAEs As Methods of Synthetic Data Generation and Augmentation to Enhance Heart Disease Prediction

Heart disease instances are rising at an alarming rate, and it is critical and essential to predict any such ailments in advance. This is a challenging diagnostic that must be done accurately and swiftly. Lack of relevant data is often the impeding factor when it comes to various areas of research. Data augmentation is a strategy for improving the training of discriminative models that may be accomplished in a variety of ways. Deep generative models, which have recently advanced, now provide new approaches to enrich current data sets. Generative Models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are frequently used to generate high quality, realistic, synthetic data essential for machine learning algorithms as they play a critical role in various classification problems. In our case, we were provided with 304 rows of heart disease data to create a robust model for predicting the presence of an ailment in the patient. However, the identification of heart disease would not be efficient given the small amount of available training data. We used GAN, CGAN, and VAE to generate data to tackle this problem, thus augmenting the original data. This additional data will help in increasing the accuracy of the models created using the new dataset. We applied classification-based Machine Learning models such as Logistic Regression, Decision Trees, KNN, and Random Forest. We compared the accuracy of the said models, each of which was supplied with the original dataset and the augmented datasets that used the data generation techniques mentioned above. Our research suggests that using data generation techniques significantly boosts the accuracy of the machine learning techniques applied to them.

Download Full-text

Generation of Synthetic Data with Conditional Generative Adversarial Networks

Logic Journal of IGPL ◽

10.1093/jigpal/jzaa059 ◽

2020 ◽

Author(s):

Belén Vega-Márquez ◽

Cristina Rubio-Escudero ◽

Isabel Nepomuceno-Chamorro

Keyword(s):

Research Work ◽

Synthetic Data ◽

Original Data ◽

Classification Problem ◽

Generative Adversarial Networks ◽

Data Generation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Original Dataset

Abstract The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to the new protection data laws that are emerging. Because of the rise in the use of Artificial Intelligence, one of the most recent proposals to address this problem is the use of Generative Adversarial Networks (GANs). These types of networks have demonstrated a great capacity to create synthetic data with very good performance. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, it is treated as any other attribute. This research work has focused on the creation of new synthetic data from datasets with different characteristics with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured in two different ways: firstly, by comparing the results obtained with classification algorithms, both in the original datasets and in the data generated; secondly, by checking that the correlation between the original data and those generated is minimal.

Download Full-text

Image Augmentation based on GAN deep learning approach with Textual Content Descriptors

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2021.3.005 ◽

2021 ◽

Vol 3 (3) ◽

pp. 210-225

Author(s):

Judy Simon

Keyword(s):

Computer Vision ◽

Synthetic Data ◽

Original Data ◽

Generative Models ◽

Training Data ◽

Generative Adversarial Networks ◽

Data Sets ◽

Biological Vision ◽

Adversarial Networks ◽

Textual Content

Computer vision, also known as computational visual perception, is a branch of artificial intelligence that allows computers to interpret digital pictures and videos in a manner comparable to biological vision. It entails the development of techniques for simulating biological vision. The aim of computer vision is to extract more meaningful information from visual input than that of a biological vision. Computer vision is exploding due to the avalanche of data being produced today. Powerful generative models, such as Generative Adversarial Networks (GANs), are responsible for significant advances in the field of picture creation. The focus of this research is to concentrate on textual content descriptors in the images used by GANs to generate synthetic data from the MNIST dataset to either supplement or replace the original data while training classifiers. This can provide better performance than other traditional image enlarging procedures due to the good handling of synthetic data. It shows that training classifiers on synthetic data are as effective as training them on pure data alone, and it also reveals that, for small training data sets, supplementing the dataset by first training GANs on the data may lead to a significant increase in classifier performance.

Download Full-text

BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliv-2-w1-2021-33-2021 ◽

2021 ◽

Vol XLIV-2/W1-2021 ◽

pp. 33-40

Author(s):

V. V. Danilov ◽

O. M. Gerget ◽

D. Y. Kolpashchikov ◽

N. V. Laptev ◽

R. A. Manakov ◽

...

Keyword(s):

Machine Learning ◽

Coordinate System ◽

Learning Algorithms ◽

Synthetic Data ◽

Real Data ◽

Machine Learning Algorithms ◽

Dice Similarity Coefficient ◽

Data Generation ◽

Heart Chamber ◽

Echocardiographic Images

Abstract. In the era of data-driven machine learning algorithms, data represents a new oil. The application of machine learning algorithms shows they need large heterogeneous datasets that crucially are correctly labeled. However, data collection and its labeling are time-consuming and labor-intensive processes. A particular task we solve using machine learning is related to the segmentation of medical devices in echocardiographic images during minimally invasive surgery. However, the lack of data motivated us to develop an algorithm generating synthetic samples based on real datasets. The concept of this algorithm is to place a medical device (catheter) in an empty cavity of an anatomical structure, for example, in a heart chamber, and then transform it. To create random transformations of the catheter, the algorithm uses a coordinate system that uniquely identifies each point regardless of the bend and the shape of the object. It is proposed to take a cylindrical coordinate system as a basis, modifying it by replacing the Z-axis with a spline along which the h-coordinate is measured. Having used the proposed algorithm, we generated new images with the catheter inserted into different heart cavities while varying its location and shape. Afterward, we compared the results of deep neural networks trained on the datasets comprised of real and synthetic data. The network trained on both real and synthetic datasets performed more accurate segmentation than the model trained only on real data. For instance, modified U-net trained on combined datasets performed segmentation with the Dice similarity coefficient of 92.6±2.2%, while the same model trained only on real samples achieved the level of 86.5±3.6%. Using a synthetic dataset allowed decreasing the accuracy spread and improving the generalization of the model. It is worth noting that the proposed algorithm allows reducing subjectivity, minimizing the labeling routine, increasing the number of samples, and improving the heterogeneity.

Download Full-text

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

ACM Computing Surveys ◽

10.1145/3463475 ◽

2022 ◽

Vol 54 (8) ◽

pp. 1-49

Author(s):

Abdul Jabbar ◽

Xi Li ◽

Bourahla Omar

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Nash Equilibrium ◽

Generative Models ◽

Generative Adversarial Networks ◽

Data Generation ◽

Crucial Issue ◽

Practical Applications ◽

Adversarial Networks ◽

And Training

The Generative Models have gained considerable attention in unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to their outstanding data generation capability. Many GAN models have been proposed, and several practical applications have emerged in various domains of computer vision and machine learning. Despite GANs excellent success, there are still obstacles to stable training. The problems are Nash equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GANs. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We discuss (I) the original GAN model and its modified versions, (II) a detailed analysis of various GAN applications in different domains, and (III) a detailed study about the various GAN training obstacles as well as training solutions. Finally, we reveal several issues as well as research outlines to the topic.

Download Full-text

Domain Adaptation of Synthetic Images for Wheat Head Detection

Plants ◽

10.3390/plants10122633 ◽

2021 ◽

Vol 10 (12) ◽

pp. 2633

Author(s):

Zane K. J. Hartley ◽

Andrew P. French

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Training Data ◽

Support Network ◽

Generative Adversarial Networks ◽

Plant Phenotyping ◽

Target Domain ◽

Original Dataset ◽

Limited Effectiveness ◽

Head Detection

Wheat head detection is a core computer vision problem related to plant phenotyping that in recent years has seen increased interest as large-scale datasets have been made available for use in research. In deep learning problems with limited training data, synthetic data have been shown to improve performance by increasing the number of training examples available but have had limited effectiveness due to domain shift. To overcome this, many adversarial approaches such as Generative Adversarial Networks (GANs) have been proposed as a solution by better aligning the distribution of synthetic data to that of real images through domain augmentation. In this paper, we examine the impacts of performing wheat head detection on the global wheat head challenge dataset using synthetic data to supplement the original dataset. Through our experimentation, we demonstrate the challenges of performing domain augmentation where the target domain is large and diverse. We then present a novel approach to improving scores through using heatmap regression as a support network, and clustering to combat high variation of the target domain.

Download Full-text

Improving quality prediction in radial-axial ring rolling using a semi-supervised approach and generative adversarial networks for synthetic data generation

Production Engineering ◽

10.1007/s11740-021-01075-x ◽

2021 ◽

Author(s):

Simon Fahle ◽

Thomas Glaser ◽

Andreas Kneißler ◽

Bernd Kuhlenkötter

Keyword(s):

Machine Learning ◽

Synthetic Data ◽

Ring Rolling ◽

Supervised Machine Learning ◽

Generative Adversarial Networks ◽

Quality Prediction ◽

Data Generation ◽

Adversarial Networks ◽

Synthetic Data Generation ◽

Axial Ring

AbstractAs artificial intelligence and especially machine learning gained a lot of attention during the last few years, methods and models have been improving and are becoming easily applicable. This possibility was used to develop a quality prediction system using supervised machine learning methods in form of time series classification models to predict ovality in radial-axial ring rolling. Different preprocessing steps and model implementations have been used to improve quality prediction. A semi-supervised approach is used to improve the prediction and analyze, to what extend it can improve current research in machine learning for quality prediciton. Moreover, first research steps are taken towards a synthetic data generation within the radial-axial ring rolling domain using generative adversarial networks.

Download Full-text

Creating Artificial Human Genomes Using Generative Models

10.1101/769091 ◽

2019 ◽

Cited By ~ 6

Author(s):

Burak Yelmen ◽

Aurélien Decelle ◽

Linda Ongaro ◽

Davide Marnetto ◽

Corentin Tallec ◽

...

Keyword(s):

Wide Spectrum ◽

Synthetic Data ◽

Low Frequency ◽

Generative Models ◽

Machine Learning Algorithms ◽

Generative Adversarial Networks ◽

Restricted Boltzmann Machines ◽

High Quality ◽

Genetic Studies ◽

Individual Privacy

AbstractGenerative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics.Yet a known limitation of this field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the high dimensional distributions of real genomic datasets and create high quality artificial genomes (AGs) with none to little privacy loss. To illustrate the promising outcomes of our method, we showed that (i) imputation quality for low frequency alleles can be improved by augmenting reference panels with AGs, (ii) scores obtained from selection tests on AGs and real genomes are highly correlated and (iii) AGs can inherit genotype-phenotype associations. AGs have the potential to become valuable assets in genetic studies by providing high quality anonymous substitutes for private databases.

Download Full-text

Optimization of Diabetes Training DATA using Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.283286 ◽

2018 ◽

Vol 6 (2) ◽

pp. 283-286

Author(s):

M. Samba Siva Rao ◽

◽

M.Yaswanth . ◽

K. Raghavendra Swamy ◽

◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data

Download Full-text

A Cloud Based Four-Tier Architecture for Early Detection of Heart Disease with Machine Learning Algorithms

2018 IEEE 4th International Conference on Computer and Communications (ICCC) ◽

10.1109/compcomm.2018.8781022 ◽

2018 ◽

Author(s):

Md. Razu Ahmed ◽

S M Hasan Mahmud ◽

Md Altab Hossin ◽

Hosney Jahan ◽

Sheak Rashed Haider Noori

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Early Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Intrusion detection of railway clearance from infrared images using generative adversarial networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-192141 ◽

2020 ◽

pp. 1-13

Author(s):

Yundong Li ◽

Yi Liu ◽

Han Dong ◽

Wei Hu ◽

Chen Lin

Keyword(s):

Intrusion Detection ◽

Synthetic Data ◽

Generative Adversarial Networks ◽

Generation Model ◽

Single Shot ◽

Data Generation ◽

Infrared Images ◽

Adversarial Networks ◽

Training Samples ◽

Rgb Images

The intrusion detection of railway clearance is crucial for avoiding railway accidents caused by the invasion of abnormal objects, such as pedestrians, falling rocks, and animals. However, detecting intrusions using deep learning methods from infrared images captured at night remains a challenging task because of the lack of sufficient training samples. To address this issue, a transfer strategy that migrates daytime RGB images to the nighttime style of infrared images is proposed in this study. The proposed method consists of two stages. In the first stage, a data generation model is trained on the basis of generative adversarial networks using RGB images and a small number of infrared images, and then, synthetic samples are generated using a well-trained model. In the second stage, a single shot multibox detector (SSD) model is trained using synthetic data and utilized to detect abnormal objects from infrared images at nighttime. To validate the effectiveness of the proposed method, two groups of experiments, namely, railway and non-railway scenes, are conducted. Experimental results demonstrate the effectiveness of the proposed method, and an improvement of 17.8% is achieved for object detection at nighttime.

Download Full-text