Image Augmentation based on GAN deep learning approach with Textual Content Descriptors

Judy Simon

doi:10.36548/jitdw.2021.3.005

Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0067 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 232-249 ◽

Cited By ~ 1

Author(s):

Benjamin Hilprecht ◽

Martin Härterich ◽

Daniel Bernau

Keyword(s):

Data Privacy ◽

Information Leakage ◽

Generative Models ◽

Generative Model ◽

Training Data ◽

Generative Adversarial Networks ◽

Data Sets ◽

Success Rates ◽

Model Quality ◽

Type Formalization

Abstract We present two information leakage attacks that outperform previous work on membership inference against generative models. The first attack allows membership inference without assumptions on the type of the generative model. Contrary to previous evaluation metrics for generative models, like Kernel Density Estimation, it only considers samples of the model which are close to training data records. The second attack specifically targets Variational Autoencoders, achieving high membership inference accuracy. Furthermore, previous work mostly considers membership inference adversaries who perform single record membership inference. We argue for considering regulatory actors who perform set membership inference to identify the use of specific datasets for training. The attacks are evaluated on two generative model architectures, Generative Adversarial Networks (GANs) and Variational Autoen-coders (VAEs), trained on standard image datasets. Our results show that the two attacks yield success rates superior to previous work on most data sets while at the same time having only very mild assumptions. We envision the two attacks in combination with the membership inference attack type formalization as especially useful. For example, to enforce data privacy standards and automatically assessing model quality in machine learning as a service setups. In practice, our work motivates the use of GANs since they prove less vulnerable against information leakage attacks while producing detailed samples.

Download Full-text

Deep generative models in DataSHIELD

BMC Medical Research Methodology ◽

10.1186/s12874-021-01237-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Stefan Lenz ◽

Moritz Hess ◽

Harald Binder

Keyword(s):

Genetic Variant ◽

Small Sample Size ◽

Synthetic Data ◽

Routine Data ◽

Original Data ◽

Generative Models ◽

Small Sample ◽

Generative Adversarial Networks ◽

Artificial Data ◽

Data Set

Abstract Background The best way to calculate statistics from medical data is to use the data of individual patients. In some settings, this data is difficult to obtain due to privacy restrictions. In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients. Methods The DataSHIELD software provides an infrastructure and a set of statistical methods for joint, privacy-preserving analyses of distributed data. The contained algorithms are reformulated to work with aggregated data from the participating sites instead of the individual data. If a desired algorithm is not implemented in DataSHIELD or cannot be reformulated in such a way, using artificial data is an alternative. Generating artificial data is possible using so-called generative models, which are able to capture the distribution of given data. Here, we employ deep Boltzmann machines (DBMs) as generative models. For the implementation, we use the package “BoltzmannMachines” from the Julia programming language and wrap it for use with DataSHIELD, which is based on R. Results We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data. Such data sets of artificial patients, which are not linked to real patients, can then be used for joint analyses. As an exemplary application, we conduct a distributed analysis with DBMs on a synthetic data set, which simulates genetic variant data. Patterns from the original data can be recovered in the artificial data using hierarchical clustering of the virtual patients, demonstrating the feasibility of the approach. Additionally, we compare DBMs, variational autoencoders, generative adversarial networks, and multivariate imputation as generative approaches by assessing the utility and disclosure of synthetic data generated from real genetic variant data in a distributed setting with data of a small sample size. Conclusions Our implementation adds to DataSHIELD the ability to generate artificial data that can be used for various analyses, e.g., for pattern recognition with deep learning. This also demonstrates more generally how DataSHIELD can be flexibly extended with advanced algorithms from languages other than R.

Download Full-text

Generative adversarial networks for generating synthetic features for Wi-Fi signal quality

PLoS ONE ◽

10.1371/journal.pone.0260308 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260308

Author(s):

Mauro Castelli ◽

Luca Manzoni ◽

Tatiane Espindola ◽

Aleš Popovič ◽

Andrea De Lorenzo

Keyword(s):

Synthetic Data ◽

Real Data ◽

Generative Models ◽

Generative Adversarial Networks ◽

Signal Quality ◽

Quality Service ◽

High Quality ◽

The Real ◽

Adversarial Networks ◽

High Quality Service

Wireless networks are among the fundamental technologies used to connect people. Considering the constant advancements in the field, telecommunication operators must guarantee a high-quality service to keep their customer portfolio. To ensure this high-quality service, it is common to establish partnerships with specialized technology companies that deliver software services in order to monitor the networks and identify faults and respective solutions. A common barrier faced by these specialized companies is the lack of data to develop and test their products. This paper investigates the use of generative adversarial networks (GANs), which are state-of-the-art generative models, for generating synthetic telecommunication data related to Wi-Fi signal quality. We developed, trained, and compared two of the most used GAN architectures: the Vanilla GAN and the Wasserstein GAN (WGAN). Both models presented satisfactory results and were able to generate synthetic data similar to the real ones. In particular, the distribution of the synthetic data overlaps the distribution of the real data for all of the considered features. Moreover, the considered generative models can reproduce the same associations observed for the synthetic features. We chose the WGAN as the final model, but both models are suitable for addressing the problem at hand.

Download Full-text

Image Colorization Progress: A Review of Deep Learning Techniques for Automation of Colorization

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/401042021 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2908-2915

Keyword(s):

Neural Networks ◽

Generative Models ◽

Generative Adversarial Networks ◽

Data Sets ◽

Image Colorization ◽

Black And White ◽

The Past ◽

Adversarial Networks ◽

Learning Techniques ◽

Made In

Image colorization is the process of taking an input gray- scale (black and white) image and then producing an output colorized image that represents the semantic color tones of the input. Since the past few years, the process of automatic image colorization has been of significant interest and a lot of progress has been made in the field by various researchers. Image colorization finds its application in many domains including medical imaging, restoration of historical documents, etc. There have been different approaches to solve this problem using Convolutional Neural Networks as well as Generative Adversarial Networks. These colorization networks are not only based on different architectures but also are tested on varied data sets. This paper aims to cover some of these proposed approaches through different techniques. The results between the generative models and traditional deep neural networks are compared along with presenting the current limitations in those. The paper proposes a summarized view of past and current advances in the field of image colorization contributed by different authors and researchers.

Download Full-text

Generation of Synthetic Data with Conditional Generative Adversarial Networks

Logic Journal of IGPL ◽

10.1093/jigpal/jzaa059 ◽

2020 ◽

Author(s):

Belén Vega-Márquez ◽

Cristina Rubio-Escudero ◽

Isabel Nepomuceno-Chamorro

Keyword(s):

Research Work ◽

Synthetic Data ◽

Original Data ◽

Classification Problem ◽

Generative Adversarial Networks ◽

Data Generation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Original Dataset

Abstract The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to the new protection data laws that are emerging. Because of the rise in the use of Artificial Intelligence, one of the most recent proposals to address this problem is the use of Generative Adversarial Networks (GANs). These types of networks have demonstrated a great capacity to create synthetic data with very good performance. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, it is treated as any other attribute. This research work has focused on the creation of new synthetic data from datasets with different characteristics with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured in two different ways: firstly, by comparing the results obtained with classification algorithms, both in the original datasets and in the data generated; secondly, by checking that the correlation between the original data and those generated is minimal.

Download Full-text

Using GANs with adaptive training data to search for new molecules

Journal of Cheminformatics ◽

10.1186/s13321-021-00494-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Andrew E. Blanchard ◽

Christopher Stanley ◽

Debsindhu Bhowmik

Keyword(s):

Drug Discovery ◽

Chemical Space ◽

Traditional Approach ◽

Chemical Compounds ◽

Original Data ◽

Training Data ◽

Generative Adversarial Networks ◽

Small Subset ◽

Adversarial Networks ◽

Potential Applications

AbstractThe process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

Download Full-text

Data Augmentation Using Generative Adversarial Networks (GANs) For GAN-Based Detection Of Pneumonia And COVID-19 In Chest X-Ray Images

10.21203/rs.3.rs-146161/v1 ◽

2021 ◽

Author(s):

Saman Motamed ◽

Patrik Rogalla ◽

Farzad Khalvati

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

Generative Models ◽

Training Data ◽

Generative Adversarial Networks ◽

X Rays ◽

X Ray ◽

Standard Data ◽

Adversarial Networks ◽

Augmentation Techniques

Abstract Successful training of convolutional neural networks (CNNs) requires a substantial amount of data. With small datasets networks generalize poorly. Data Augmentation techniques improve the generalizability of neural networks by using existing training data more effectively. Standard data augmentation methods, however, produce limited plausible alternative data. Generative Adversarial Networks (GANs) have been utilized to generate new data and improve the performance of CNNs. Nevertheless, data augmentation techniques for training GANs are under-explored compared to CNNs. In this work, we propose a new GAN architecture for augmentation of chest X-rays for semi-supervised detection of pneumonia and COVID-19 using generative models. We show that the proposed GAN can be used to effectively augment data and improve classification accuracy of disease in chest X-rays for pneumonia and COVID-19. We compare our augmentation GAN model with Deep Convolutional GAN and traditional augmentation methods (rotate, zoom, etc) on two different X-ray datasets and show our GAN-based augmentation method surpasses other augmentation methods for training a GAN in detecting anomalies in X-ray images.

Download Full-text

Geometric Morphometric Data Augmentation Using Generative Computational Learning Algorithms

Applied Sciences ◽

10.3390/app10249133 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9133

Author(s):

Lloyd A. Courtenay ◽

Diego González-Aguilera

Keyword(s):

Sample Size ◽

Data Augmentation ◽

Synthetic Data ◽

Model Performance ◽

Training Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Geometric Morphometric ◽

Adversarial Networks ◽

The Impact

The fossil record is notorious for being incomplete and distorted, frequently conditioning the type of knowledge that can be extracted from it. In many cases, this often leads to issues when performing complex statistical analyses, such as classification tasks, predictive modelling, and variance analyses, such as those used in Geometric Morphometrics. Here different Generative Adversarial Network architectures are experimented with, testing the effects of sample size and domain dimensionality on model performance. For model evaluation, robust statistical methods were used. Each of the algorithms were observed to produce realistic data. Generative Adversarial Networks using different loss functions produced multidimensional synthetic data significantly equivalent to the original training data. Conditional Generative Adversarial Networks were not as successful. The methods proposed are likely to reduce the impact of sample size and bias on a number of statistical learning applications. While Generative Adversarial Networks are not the solution to all sample-size related issues, combined with other pre-processing steps these limitations may be overcome. This presents a valuable means of augmenting geometric morphometric datasets for greater predictive visualization.

Download Full-text

Geometric Morphometric Data Augmentation using Generative Computational Learning Algorithms

10.20944/preprints202011.0696.v1 ◽

2020 ◽

Author(s):

Lloyd A. Courtenay ◽

Diego González-Aguilera

Keyword(s):

Sample Size ◽

Data Augmentation ◽

Synthetic Data ◽

Model Performance ◽

Training Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Geometric Morphometric ◽

Adversarial Networks ◽

The Impact

The fossil record is notorious for being incomplete and distorted, frequently conditioning the type of knowledge that can be extracted from it. In many cases, this often leads to issues when performing complex statistical analyses, such as classification tasks, predictive modelling, and variance analyses, such as those used in Geometric Morphometrics. Here different Generative Adversarial Network architectures are experimented with, testing the effects of sample size and domain dimensionality on model performance. For model evaluation, robust statistical methods were used. Each of the algorithms were observed to produce realistic data. Generative Adversarial Networks using different loss functions produced multidimensional synthetic data significantly equivalent to the original training data. Conditional Generative Adversarial Networks were not as successful. The methods proposed are likely to reduce the impact of sample size and bias on a number of statistical learning applications. While Generative Adversarial Networks are not the solution to all sample-size related issues, combined with other pre-processing steps these limitations may be overcome. This presents a valuable means of augmenting geometric morphometric datasets for greater predictive visualization.

Download Full-text

Generating Synthetic Disguised Faces with Cycle-Consistency Loss and an Automated Filtering Algorithm

Mathematics ◽

10.3390/math10010004 ◽

2021 ◽

Vol 10 (1) ◽

pp. 4

Author(s):

Mobeen Ahmad ◽

Usman Cheema ◽

Muhammad Abdullah ◽

Seungbin Moon ◽

Dongil Han

Keyword(s):

Facial Recognition ◽

Recognition Rate ◽

Synthetic Data ◽

Personal Identification ◽

Training Data ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Face Images ◽

Recognition Algorithms ◽

Facial Images

Applications for facial recognition have eased the process of personal identification. However, there are increasing concerns about the performance of these systems against the challenges of presentation attacks, spoofing, and disguises. One of the reasons for the lack of a robustness of facial recognition algorithms in these challenges is the limited amount of suitable training data. This lack of training data can be addressed by creating a database with the subjects having several disguises, but this is an expensive process. Another approach is to use generative adversarial networks to synthesize facial images with the required disguise add-ons. In this paper, we present a synthetic disguised face database for the training and evaluation of robust facial recognition algorithms. Furthermore, we present a methodology for generating synthetic facial images for the desired disguise add-ons. Cycle-consistency loss is used to generate facial images with disguises, e.g., fake beards, makeup, and glasses, from normal face images. Additionally, an automated filtering scheme is presented for automated data filtering from the synthesized faces. Finally, facial recognition experiments are performed on the proposed synthetic data to show the efficacy of the proposed methodology and the presented database. Training on the proposed database achieves an improvement in the rank-1 recognition rate (68.3%), over a model trained on the original nondisguised face images.

Download Full-text