Equivariant Adversarial Network for Image-to-image Translation

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.

Download Full-text

Multi-Attribute Transfer via Disentangled Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019195 ◽

2019 ◽

Vol 33 ◽

pp. 9195-9202 ◽

Cited By ~ 4

Author(s):

Jianfu Zhang ◽

Yuanyuan Huang ◽

Yaoyi Li ◽

Weijie Zhao ◽

Liqing Zhang

Keyword(s):

Neural Network ◽

Facial Expression ◽

Generative Adversarial Networks ◽

Significant Progress ◽

Target Domain ◽

Adversarial Networks ◽

Proposed Model ◽

Image Translation ◽

Realistic Images ◽

Novel Model

Recent studies show significant progress in image-to-image translation task, especially facilitated by Generative Adversarial Networks. They can synthesize highly realistic images and alter the attribute labels for the images. However, these works employ attribute vectors to specify the target domain which diminishes image-level attribute diversity. In this paper, we propose a novel model formulating disentangled representations by projecting images to latent units, grouped feature channels of Convolutional Neural Network, to disassemble the information between different attributes. Thanks to disentangled representation, we can transfer attributes according to the attribute labels and moreover retain the diversity beyond the labels, namely, the styles inside each image. This is achieved by specifying some attributes and swapping the corresponding latent units to “swap” the attributes appearance, or applying channel-wise interpolation to blend different attributes. To verify the motivation of our proposed model, we train and evaluate our model on face dataset CelebA. Furthermore, the evaluation of another facial expression dataset RaFD demonstrates the generalizability of our proposed model.

Download Full-text

Multimodal Structure-Consistent Image-to-Image Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6814 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11490-11498

Author(s):

Che-Tsung Lin ◽

Yen-Yi Wu ◽

Po-Hao Hsu ◽

Shang-Hong Lai

Keyword(s):

Data Augmentation ◽

Network Models ◽

Detection Accuracy ◽

Target Domain ◽

Generative Adversarial Network ◽

Image Objects ◽

Perceptual Distance ◽

Adversarial Network ◽

Image Translation ◽

Realistic Images

Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.

Download Full-text

Generative Model for Skeletal Human Movements Based on Conditional DC-GAN Applied to Pseudo-Images

Algorithms ◽

10.3390/a13120319 ◽

2020 ◽

Vol 13 (12) ◽

pp. 319

Author(s):

Wang Xi ◽

Guillaume Devineau ◽

Fabien Moutarde ◽

Jie Yang

Keyword(s):

Data Augmentation ◽

Human Movement ◽

Generative Models ◽

Generative Model ◽

Great Success ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Human Movements ◽

Action Type ◽

Skeleton Image

Generative models for images, audio, text, and other low-dimension data have achieved great success in recent years. Generating artificial human movements can also be useful for many applications, including improvement of data augmentation methods for human gesture recognition. The objective of this research is to develop a generative model for skeletal human movement, allowing to control the action type of generated motion while keeping the authenticity of the result and the natural style variability of gesture execution. We propose to use a conditional Deep Convolutional Generative Adversarial Network (DC-GAN) applied to pseudo-images representing skeletal pose sequences using tree structure skeleton image format. We evaluate our approach on the 3D skeletal data provided in the large NTU_RGB+D public dataset. Our generative model can output qualitatively correct skeletal human movements for any of the 60 action classes. We also quantitatively evaluate the performance of our model by computing Fréchet inception distances, which shows strong correlation to human judgement. To the best of our knowledge, our work is the first successful class-conditioned generative model for human skeletal motions based on pseudo-image representation of skeletal pose sequences.

Download Full-text

Semi Supervised Generative Adversarial Network for Automated Glaucoma Diagnosis with Stacked Discriminator Models

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3787 ◽

2021 ◽

Vol 11 (5) ◽

pp. 1334-1340

Author(s):

K. Gokul Kannan ◽

T. R. Ganesh Babu

Keyword(s):

Network Architecture ◽

Super Resolution ◽

Unsupervised Classification ◽

Learning Task ◽

Generative Models ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Glaucoma Diagnosis ◽

Image Translation

Generative Adversarial Network (GAN) is neural network architecture, widely used in many computer vision applications such as super-resolution image generation, art creation and image to image translation. A conventional GAN model consists of two sub-models; generative model and discriminative model. The former one generates new samples based on an unsupervised learning task, and the later one classifies them into real or fake. Though GAN is most commonly used for training generative models, it can be used for developing a classifier model. The main objective is to extend the effectiveness of GAN into semi-supervised learning, i.e., for the classification of fundus images to diagnose glaucoma. The discriminator model in the conventional GAN is improved via transfer learning to predict n + 1 classes by training the model for both supervised classification (n classes) and unsupervised classification (fake or real). Both models share all feature extraction layers and differ in the output layers. Thus any update in one of the model will impact both models. Results show that the semi-supervised GAN performs well than a standalone Convolution Neural Networks (CNNs) model.

Download Full-text

Quantum-assisted associative adversarial network: applying quantum annealing in deep learning

Quantum Machine Intelligence ◽

10.1007/s42484-021-00047-9 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Max Wilson ◽

Thomas Vandal ◽

Tad Hogg ◽

Eleanor G. Rieffel

Keyword(s):

Deep Learning ◽

Latent Variable ◽

Graphical Model ◽

Generative Models ◽

Generative Model ◽

Feature Representation ◽

Boltzmann Machine ◽

Adversarial Network ◽

Uniform Noise ◽

Quantum Processor

AbstractGenerative models have the capacity to model and generate new examples from a dataset and have an increasingly diverse set of applications driven by commercial and academic interest. In this work, we present an algorithm for learning a latent variable generative model via generative adversarial learning where the canonical uniform noise input is replaced by samples from a graphical model. This graphical model is learned by a Boltzmann machine which learns low-dimensional feature representation of data extracted by the discriminator. A quantum processor can be used to sample from the model to train the Boltzmann machine. This novel hybrid quantum-classical algorithm joins a growing family of algorithms that use a quantum processor sampling subroutine in deep learning, and provides a scalable framework to test the advantages of quantum-assisted learning. For the latent space model, fully connected, symmetric bipartite and Chimera graph topologies are compared on a reduced stochastically binarized MNIST dataset, for both classical and quantum sampling methods. The quantum-assisted associative adversarial network successfully learns a generative model of the MNIST dataset for all topologies. Evaluated using the Fréchet inception distance and inception score, the quantum and classical versions of the algorithm are found to have equivalent performance for learning an implicit generative model of the MNIST dataset. Classical sampling is used to demonstrate the algorithm on the LSUN bedrooms dataset, indicating scalability to larger and color datasets. Though the quantum processor used here is a quantum annealer, the algorithm is general enough such that any quantum processor, such as gate model quantum computers, may be substituted as a sampler.

Download Full-text

Diversity-Generated Image Inpainting with Style Extraction

10.20944/preprints201912.0028.v1 ◽

2019 ◽

Author(s):

Weiwei Cai ◽

Zhanguo Wei

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Image Inpainting ◽

Ground Truth ◽

Generative Model ◽

Input Noise ◽

Latent Vector ◽

Proposed Model ◽

Ground Truth Image

The latest methods based on deep learning have achieved amazing results regarding the complex work of inpainting large missing areas in an image. This type of method generally attempts to generate one single "optimal" inpainting result, ignoring many other plausible results. However, considering the uncertainty of the inpainting task, one sole result can hardly be regarded as a desired regeneration of the missing area. In view of this weakness, which is related to the design of the previous algorithms, we propose a novel deep generative model equipped with a brand new style extractor which can extract the style noise (a latent vector) from the ground truth image. Once obtained, the extracted style noise and the ground truth image are both input into the generator. We also craft a consistency loss that guides the generated image to approximate the ground truth. Meanwhile, the same extractor captures the style noise from the generated image, which is forced to approach the input noise according to the consistency loss. After iterations, our generator is able to learn the styles corresponding to multiple sets of noise. The proposed model can generate a (sufficiently large) number of inpainting results consistent with the context semantics of the image. Moreover, we check the effectiveness of our model on three databases, i.e., CelebA, Agricultural Disease, and MauFlex. Compared to state-of-the-art inpainting methods, this model is able to offer desirable inpainting results with both a better quality and higher diversity. The code and model will be made available on https://github.com/vivitsai/SEGAN.

Download Full-text

Benign Examples: Imperceptible Changes Can Enhance Image Translation Performance

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6042 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5842-5850

Author(s):

Vignesh Srinivasan ◽

Klaus-Robert Müller ◽

Wojciech Samek ◽

Shinichi Nakajima

Keyword(s):

User Study ◽

State Of The Art ◽

Score Function ◽

Generative Adversarial Networks ◽

Target Domain ◽

Image Domain ◽

Subtle Change ◽

Input Space ◽

Image Translation ◽

Consistency Constraint

Unpaired image-to-image domain translation involves the task of transferring an image in one domain to another domain without having pairs of data for supervision. Several methods have been proposed to address this task using Generative Adversarial Networks (GANs) and cycle consistency constraint enforcing the translated image to be mapped back to the original domain. This way, a Deep Neural Network (DNN) learns mapping such that the input training distribution transferred to the target domain matches the target training distribution. However, not all test images are expected to fall inside the data manifold in the input space where the DNN has learned to perform the mapping very well. Such images can have a poor mapping to the target domain. In this paper, we propose to perform Langevin dynamics, which makes a subtle change in the input space bringing them close to the data manifold, producing benign examples. The effect is significant improvement of the mapped image on the target domain. We also show that the score function estimation by denoising autoencoder (DAE), can practically be replaced with any autoencoding structure, which most image-to-image translation methods contain intrinsically due to the cycle consistency constraint. Thus, no additional training is required. We show advantages of our approach for several state-of-the-art image-to-image domain translation models. Quantitative evaluation shows that our proposed method leads to a substantial increase in the accuracy to the target label on multiple state-of-the-art image classifiers, while qualitative user study proves that our method better represents the target domain, achieving better human preference scores.

Download Full-text

Generative Model for Skeletal Human Movements based on conditional DC-GAN applied to pseudo-images

10.20944/preprints202011.0039.v1 ◽

2020 ◽

Author(s):

Wang Xi ◽

Guillaume Devineau ◽

Fabien Moutarde ◽

Jie Yang

Keyword(s):

Data Augmentation ◽

Human Movement ◽

Generative Models ◽

Generative Model ◽

Great Success ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Human Movements ◽

Action Type ◽

Skeleton Image

Generative models for images, audio, text and other low-dimension data have achieved great success in recent years. Generating artificial human movements can also be useful for many applications, including improvement of data augmentation methods for human gesture recognition. The object of this research is to develop a generative model for skeletal human movement, allowing to control the action type of generated motion while keeping the authenticity of the result and the natural style variability of gesture execution. We propose to use a conditional Deep Convolutional Generative Adversarial Network (DC-GAN) applied to pseudo-images representing skeletal pose sequences using Tree Structure Skeleton Image format. We evaluate our approach on the 3D-skeleton data provided in the large NTU RGB+D public dataset. Our generative model can output qualitatively correct skeletal human movements for any of its 60 action classes. We also quantitatively evaluate the performance of our model by computing Frechet Inception Distances, which shows strong correlation to human judgement. Up to our knowledge, our work is the first successful class-conditioned generative model for human skeletal motions based on pseudo-image representation of skeletal pose sequences.

Download Full-text

Informative Multimodal Unsupervised Image-to-Image Translation

10.5121/csit.2021.110503 ◽

2021 ◽

Author(s):

Tien Tai Doan ◽

Guillaume Ghyselinck ◽

Blaise Hanczar

Keyword(s):

Image Quality ◽

Mutual Information ◽

State Of The Art ◽

The State ◽

New Method ◽

Target Domain ◽

Multiple Images ◽

Source Domain ◽

Multiple Image ◽

Image Translation

We propose a new method of multimodal image translation, called InfoMUNIT, which is an extension of the state-of-the-art method MUNIT. Our method allows controlling the style of the generated images and improves their quality and diversity. It learns to maximize the mutual information between a subset of style code and the distribution of the output images. Experiments show that our model cannot only translate one image from the source domain to multiple images in the target domain but also explore and manipulate features of the outputs without annotation. Furthermore, it achieves a superior diversity and a competitive image quality to state-of-the-art methods in multiple image translation tasks.

Download Full-text

Unsupervised Exemplar-Domain Aware Image-to-Image Translation

Entropy ◽

10.3390/e23050565 ◽

2021 ◽

Vol 23 (5) ◽

pp. 565

Author(s):

Yuanbin Fu ◽

Jiayi Ma ◽

Xiaojie Guo

Keyword(s):

Feature Extraction ◽

State Of The Art ◽

Training Phase ◽

Target Domain ◽

Style Transfer ◽

Network Partition ◽

Logical Network ◽

Image Translation ◽

Multiple Domains ◽

Logical Perspective

Image-to-image translation is used to convert an image of a certain style to another of the target style with the original content preserved. A desired translator should be capable of generating diverse results in a controllable many-to-many fashion. To this end, we design a novel deep translator, namely exemplar-domain aware image-to-image translator (EDIT for short). From a logical perspective, the translator needs to perform two main functions, i.e., feature extraction and style transfer. With consideration of logical network partition, the generator of our EDIT comprises of a part of blocks configured by shared parameters, and the rest by varied parameters exported by an exemplar-domain aware parameter network, for explicitly imitating the functionalities of extraction and mapping. The principle behind this is that, for images from multiple domains, the content features can be obtained by an extractor, while (re-)stylization is achieved by mapping the extracted features specifically to different purposes (domains and exemplars). In addition, a discriminator is equipped during the training phase to guarantee the output satisfying the distribution of the target domain. Our EDIT can flexibly and effectively work on multiple domains and arbitrary exemplars in a unified neat model. We conduct experiments to show the efficacy of our design, and reveal its advances over other state-of-the-art methods both quantitatively and qualitatively.

Download Full-text