Generating Synthetic ECGs Using GANs for Anonymizing Healthcare Data

Esteban Piacentino; Alvaro Guarner; Cecilio Angulo

doi:10.3390/electronics10040389

Generating Synthetic ECGs Using GANs for Anonymizing Healthcare Data

Electronics ◽

10.3390/electronics10040389 ◽

2021 ◽

Vol 10 (4) ◽

pp. 389

Author(s):

Esteban Piacentino ◽

Alvaro Guarner ◽

Cecilio Angulo

Keyword(s):

Original Data ◽

Generative Adversarial Networks ◽

Sensitive Data ◽

Data Anonymization ◽

Healthcare Data ◽

Research Areas ◽

Adversarial Networks ◽

Video Frames ◽

Private Data ◽

Privacy Issues

In personalized healthcare, an ecosystem for the manipulation of reliable and safe private data should be orchestrated. This paper describes an approach for the generation of synthetic electrocardiograms (ECGs) based on Generative Adversarial Networks (GANs) with the objective of anonymizing users’ information for privacy issues. This is intended to create valuable data that can be used both in educational and research areas, while avoiding the risk of a sensitive data leakage. As GANs are mainly exploited on images and video frames, we are proposing general raw data processing after transformation into an image, so it can be managed through a GAN, then decoded back to the original data domain. The feasibility of our transformation and processing hypothesis is primarily demonstrated. Next, from the proposed procedure, main drawbacks for each step in the procedure are addressed for the particular case of ECGs. Hence, a novel research pathway on health data anonymization using GANs is opened and further straightforward developments are expected.

Download Full-text

Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v2020.n1.894 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 1-13

Author(s):

Ly Vu ◽

Quang Uy Nguyen

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Detection System ◽

Imbalanced Data ◽

Original Data ◽

Machine Learning Techniques ◽

Generative Adversarial Networks ◽

Detection Systems ◽

Adversarial Networks ◽

Attack Data

Machine learning-based intrusion detection hasbecome more popular in the research community thanks to itscapability in discovering unknown attacks. To develop a gooddetection model for an intrusion detection system (IDS) usingmachine learning, a great number of attack and normal datasamples are required in the learning process. While normaldata can be relatively easy to collect, attack data is muchrarer and harder to gather. Subsequently, IDS datasets areoften dominated by normal data and machine learning modelstrained on those imbalanced datasets are ineffective in detect-ing attacks. In this paper, we propose a novel solution to thisproblem by using generative adversarial networks to generatesynthesized attack data for IDS. The synthesized attacks aremerged with the original data to form the augmented dataset.Three popular machine learning techniques are trained on theaugmented dataset. The experiments conducted on the threecommon IDS datasets and one our own dataset show thatmachine learning algorithms achieve better performance whentrained on the augmented dataset of the generative adversarialnetworks compared to those trained on the original datasetand other sampling techniques. The visualization techniquewas also used to analyze the properties of the synthesizeddata of the generative adversarial networks and the others.

Download Full-text

Study on Optimal Generative Network for Synthesizing Brain Tumor-Segmented MR Images

Mathematical Problems in Engineering ◽

10.1155/2020/8273173 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Hyunhee Lee ◽

Jaechoon Jo ◽

Heuiseok Lim

Keyword(s):

Brain Tumor ◽

Medical Imaging ◽

Image Synthesis ◽

Generative Adversarial Networks ◽

Imaging Data ◽

Mr Images ◽

Style Transfer ◽

Adversarial Networks ◽

Robust Model ◽

Privacy Issues

Due to institutional and privacy issues, medical imaging researches are confronted with serious data scarcity. Image synthesis using generative adversarial networks provides a generic solution to the lack of medical imaging data. We synthesize high-quality brain tumor-segmented MR images, which consists of two tasks: synthesis and segmentation. We performed experiments with two different generative networks, the first using the ResNet model, which has significant advantages of style transfer, and the second, the U-Net model, one of the most powerful models for segmentation. We compare the performance of each model and propose a more robust model for synthesizing brain tumor-segmented MR images. Although ResNet produced better-quality images than did U-Net for the same samples, it used a great deal of memory and took much longer to train. U-Net, meanwhile, segmented the brain tumors more accurately than did ResNet.

Download Full-text

A Comprehensive Survey on Data Utility and Privacy: Taking Indian Healthcare System as a Potential Case Study

Inventions ◽

10.3390/inventions6030045 ◽

2021 ◽

Vol 6 (3) ◽

pp. 45

Author(s):

Prathamesh Churi ◽

Ambika Pawar ◽

Antonio-José Moreno-Guerrero

Keyword(s):

Healthcare System ◽

Large Population ◽

Healthcare Systems ◽

Healthcare Management ◽

Healthcare Sector ◽

Sensitive Data ◽

Data Breaches ◽

Healthcare Data ◽

Privacy Issues

Background: According to the renowned and Oscar award-winning American actor and film director Marlon Brando, “privacy is not something that I am merely entitled to, it is an absolute prerequisite.” Privacy threats and data breaches occur daily, and countries are mitigating the consequences caused by privacy and data breaches. The Indian healthcare industry is one of the largest and rapidly developing industry. Overall, healthcare management is changing from disease-centric into patient-centric systems. Healthcare data analysis also plays a crucial role in healthcare management, and the privacy of patient records must receive equal attention. Purpose: This paper mainly presents the utility and privacy factors of the Indian healthcare data and discusses the utility aspect and privacy problems concerning Indian healthcare systems. It defines policies that reform Indian healthcare systems. The case study of the NITI Aayog report is presented to explain how reformation occurs in Indian healthcare systems. Findings: It is found that there have been numerous research studies conducted on Indian healthcare data across all dimensions; however, privacy problems in healthcare, specifically in India, are caused by prevalent complacency, culture, politics, budget limitations, large population, and existing infrastructures. This paper reviews the Indian healthcare system and the applications that drive it. Additionally, the paper also maps that how privacy issues are happening in every healthcare sector in India. Originality/Value: To understand these factors and gain insights, understanding Indian healthcare systems first is crucial. To the best of our knowledge, we found no recent papers that thoroughly reviewed the Indian healthcare system and its privacy issues. The paper is original in terms of its overview of the healthcare system and privacy issues. Social Implications: Privacy has been the most ignored part of the Indian healthcare system. With India being a country with a population of 130 billion, much healthcare data are generated every day. The chances of data breaches and other privacy violations on such sensitive data cannot be avoided as they cause severe concerns for individuals. This paper segregates the healthcare system’s advances and lists the privacy that needs to be addressed first.

Download Full-text

Generation of Synthetic Data with Conditional Generative Adversarial Networks

Logic Journal of IGPL ◽

10.1093/jigpal/jzaa059 ◽

2020 ◽

Author(s):

Belén Vega-Márquez ◽

Cristina Rubio-Escudero ◽

Isabel Nepomuceno-Chamorro

Keyword(s):

Research Work ◽

Synthetic Data ◽

Original Data ◽

Classification Problem ◽

Generative Adversarial Networks ◽

Data Generation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Original Dataset

Abstract The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to the new protection data laws that are emerging. Because of the rise in the use of Artificial Intelligence, one of the most recent proposals to address this problem is the use of Generative Adversarial Networks (GANs). These types of networks have demonstrated a great capacity to create synthetic data with very good performance. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, it is treated as any other attribute. This research work has focused on the creation of new synthetic data from datasets with different characteristics with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured in two different ways: firstly, by comparing the results obtained with classification algorithms, both in the original datasets and in the data generated; secondly, by checking that the correlation between the original data and those generated is minimal.

Download Full-text

Deep learning for 3D seismic compressive-sensing technique: A novel approach

The Leading Edge ◽

10.1190/tle38090698.1 ◽

2019 ◽

Vol 38 (9) ◽

pp. 698-705

Author(s):

Ping Lu ◽

Yuan Xiao ◽

Yanyan Zhang ◽

Nikolaos Mitsakos

Keyword(s):

Deep Learning ◽

Compressive Sensing ◽

Field Data ◽

Original Data ◽

Generative Adversarial Networks ◽

Sampled Data ◽

Computationally Efficient ◽

Efficient Manner ◽

Adversarial Networks ◽

Novel Approach

A deep-learning-based compressive-sensing technique for reconstruction of missing seismic traces is introduced. The agility of the proposed approach lies in its ability to perfectly resolve the optimization limitation of conventional algorithms that solve inversion problems. It demonstrates how deep generative adversarial networks, equipped with an appropriate loss function that essentially leverages the distribution of the entire survey, can serve as an alternative approach for tackling compressive-sensing problems with high precision and in a computationally efficient manner. The method can be applied on both prestack and poststack seismic data, allowing for superior imaging quality with well-preconditioned and well-sampled field data, during the processing stage. To validate the robustness of the proposed approach on field data, the extent to which amplitudes and phase variations in original data are faithfully preserved is established, while subsurface consistency is also achieved. Several applications to acquisition and processing, such as decreasing bin size, increasing offset and azimuth sampling, or increasing the fold, can directly and immediately benefit from adopting the proposed technique. Furthermore, interpolation based on generative adversarial networks has been found to produce better-sampled data sets, with stronger regularization and attenuated aliasing phenomenon, while providing greater fidelity on steep-dip events and amplitude-variation-with-offset analysis with migration.

Download Full-text

Local Data Debiasing for Fairness Based on Generative Adversarial Training

Algorithms ◽

10.3390/a14030087 ◽

2021 ◽

Vol 14 (3) ◽

pp. 87

Author(s):

Ulrich Aïvodji ◽

François Bidet ◽

Sébastien Gambs ◽

Rosin Claude Ngueveu ◽

Alain Tapp

Keyword(s):

Ethical Issues ◽

Original Data ◽

Decision Processes ◽

Generative Adversarial Networks ◽

Local Data ◽

Trade Off ◽

Adversarial Networks ◽

Training Approach ◽

Sensitive Attribute ◽

Adversarial Training

The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness of the process and the possible resulting discrimination. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on their profile before releasing it. Finally, experiments on real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.

Download Full-text

Using GANs with adaptive training data to search for new molecules

Journal of Cheminformatics ◽

10.1186/s13321-021-00494-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Andrew E. Blanchard ◽

Christopher Stanley ◽

Debsindhu Bhowmik

Keyword(s):

Drug Discovery ◽

Chemical Space ◽

Traditional Approach ◽

Chemical Compounds ◽

Original Data ◽

Training Data ◽

Generative Adversarial Networks ◽

Small Subset ◽

Adversarial Networks ◽

Potential Applications

AbstractThe process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

Download Full-text

Image Augmentation based on GAN deep learning approach with Textual Content Descriptors

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2021.3.005 ◽

2021 ◽

Vol 3 (3) ◽

pp. 210-225

Author(s):

Judy Simon

Keyword(s):

Computer Vision ◽

Synthetic Data ◽

Original Data ◽

Generative Models ◽

Training Data ◽

Generative Adversarial Networks ◽

Data Sets ◽

Biological Vision ◽

Adversarial Networks ◽

Textual Content

Computer vision, also known as computational visual perception, is a branch of artificial intelligence that allows computers to interpret digital pictures and videos in a manner comparable to biological vision. It entails the development of techniques for simulating biological vision. The aim of computer vision is to extract more meaningful information from visual input than that of a biological vision. Computer vision is exploding due to the avalanche of data being produced today. Powerful generative models, such as Generative Adversarial Networks (GANs), are responsible for significant advances in the field of picture creation. The focus of this research is to concentrate on textual content descriptors in the images used by GANs to generate synthetic data from the MNIST dataset to either supplement or replace the original data while training classifiers. This can provide better performance than other traditional image enlarging procedures due to the good handling of synthetic data. It shows that training classifiers on synthetic data are as effective as training them on pure data alone, and it also reveals that, for small training data sets, supplementing the dataset by first training GANs on the data may lead to a significant increase in classifier performance.

Download Full-text

GEAC: Generating and Evaluating Handwritten Arabic Characters Using Generative Adversarial Networks

10.31224/osf.io/ea4pb ◽

2020 ◽

Author(s):

Tarik Alafif

Keyword(s):

Research Work ◽

Generative Adversarial Networks ◽

Great Success ◽

Generative Adversarial Network ◽

Research Areas ◽

Adversarial Network ◽

Adversarial Networks ◽

A Value ◽

Arabic Characters ◽

Handwritten Arabic

Generative Adversarial Network (GAN) has made a breakthrough and great success in many research areas in computer vision. Different GANs generate different outputs. In this research work, we apply different GANs to generate handwritten Arabic characters. A basic GAN, Vanilla GAN, Deep Convolutional GAN (DCGAN), Bidirectional GAN (BiGAN), and Wasserstein GAN (WGAN) are used. Then, the results of the generated images are evaluated using native-Arabic human and Fréchet Inception Distance (FID). The qualitative and quantitative results are provided for the images generation and evaluation. In experimental evaluation, WGAN achieves better results in FID with a value of 96.007. On the other hand, DCGAN achieves better results in native-Arabic human evaluation with a value of 35%.

Download Full-text

DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine

Scientific Reports ◽

10.1038/s41598-021-01295-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Vajira Thambawita ◽

Jonas L. Isaksen ◽

Steven A. Hicks ◽

Jonas Ghouse ◽

Gustav Ahlberg ◽

...

Keyword(s):

Population Studies ◽

Medical Science ◽

Synthetic Data ◽

Real Data ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Similar Information ◽

Normal Ecgs ◽

Privacy Issues ◽

Python Package

AbstractRecent global developments underscore the prominent role big data have in modern medical science. But privacy issues constitute a prevalent problem for collecting and sharing data between researchers. However, synthetic data generated to represent real data carrying similar information and distribution may alleviate the privacy issue. In this study, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 10-s 12-lead electrocardiograms (ECGs). We have developed and compared two methods, named WaveGAN* and Pulse2Pulse. We trained the GANs with 7,233 real normal ECGs to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN* to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. Although these synthetic ECGs mimic the dataset used for creation, the ECGs are not linked to any individuals and may thus be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using generative adversarial neural networks on normal ECGs from two population studies, thereby addressing the relevant privacy issues in medical datasets.

Download Full-text