MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/122 ◽

2018 ◽

Cited By ~ 2

Author(s):

David Keetae Park ◽

Seungjoo Yoo ◽

Hyojin Bahng ◽

Jaegul Choo ◽

Noseong Park

Keyword(s):

Structural Similarity ◽

Poor Quality ◽

Generative Adversarial Networks ◽

Mixture Of Experts ◽

Multiple Modalities ◽

Adversarial Networks ◽

Novel Approach ◽

Proposed Model ◽

Separate Step ◽

Realistic Images

Recently, generative adversarial networks (GANs) have shown promising performance in generating realistic images. However, they often struggle in learning complex underlying modalities in a given dataset, resulting in poor-quality generated images. To mitigate this problem, we present a novel approach called mixture of experts GAN (MEGAN), an ensemble approach of multiple generator networks. Each generator network in MEGAN specializes in generating images with a particular subset of modalities, e.g., an image class. Instead of incorporating a separate step of handcrafted clustering of multiple modalities, our proposed model is trained through an end-to-end learning of multiple generators via gating networks, which is responsible for choosing the appropriate generator network for a given condition. We adopt the categorical reparameterization trick for a categorical decision to be made in selecting a generator while maintaining the flow of the gradients. We demonstrate that individual generators learn different and salient subparts of the data and achieve a multiscale structural similarity (MS-SSIM) score of 0.2470 for CelebA and a competitive unsupervised inception score of 8.33 in CIFAR-10.

Get full-text (via PubEx)

Multi-Attribute Transfer via Disentangled Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019195 ◽

2019 ◽

Vol 33 ◽

pp. 9195-9202 ◽

Cited By ~ 4

Author(s):

Jianfu Zhang ◽

Yuanyuan Huang ◽

Yaoyi Li ◽

Weijie Zhao ◽

Liqing Zhang

Keyword(s):

Neural Network ◽

Facial Expression ◽

Generative Adversarial Networks ◽

Significant Progress ◽

Target Domain ◽

Adversarial Networks ◽

Proposed Model ◽

Image Translation ◽

Realistic Images ◽

Novel Model

Recent studies show significant progress in image-to-image translation task, especially facilitated by Generative Adversarial Networks. They can synthesize highly realistic images and alter the attribute labels for the images. However, these works employ attribute vectors to specify the target domain which diminishes image-level attribute diversity. In this paper, we propose a novel model formulating disentangled representations by projecting images to latent units, grouped feature channels of Convolutional Neural Network, to disassemble the information between different attributes. Thanks to disentangled representation, we can transfer attributes according to the attribute labels and moreover retain the diversity beyond the labels, namely, the styles inside each image. This is achieved by specifying some attributes and swapping the corresponding latent units to “swap” the attributes appearance, or applying channel-wise interpolation to blend different attributes. To verify the motivation of our proposed model, we train and evaluate our model on face dataset CelebA. Furthermore, the evaluation of another facial expression dataset RaFD demonstrates the generalizability of our proposed model.

Get full-text (via PubEx)

Three-Dimensional Liver Image Segmentation Using Generative Adversarial Networks Based on Feature Restoration

Frontiers in Medicine ◽

10.3389/fmed.2021.794969 ◽

2022 ◽

Vol 8 ◽

Author(s):

Runnan He ◽

Shiqi Xu ◽

Yashu Liu ◽

Qince Li ◽

Yang Liu ◽

...

Keyword(s):

Medical Imaging ◽

Random Noise ◽

Three Dimensional ◽

Poor Quality ◽

Training Data ◽

Generative Adversarial Networks ◽

Liver Segmentation ◽

Deep Convolutional Neural Networks ◽

Adversarial Networks ◽

Liver Region

Medical imaging provides a powerful tool for medical diagnosis. In the process of computer-aided diagnosis and treatment of liver cancer based on medical imaging, accurate segmentation of liver region from abdominal CT images is an important step. However, due to defects of liver tissue and limitations of CT imaging procession, the gray level of liver region in CT image is heterogeneous, and the boundary between the liver and those of adjacent tissues and organs is blurred, which makes the liver segmentation an extremely difficult task. In this study, aiming at solving the problem of low segmentation accuracy of the original 3D U-Net network, an improved network based on the three-dimensional (3D) U-Net, is proposed. Moreover, in order to solve the problem of insufficient training data caused by the difficulty of acquiring labeled 3D data, an improved 3D U-Net network is embedded into the framework of generative adversarial networks (GAN), which establishes a semi-supervised 3D liver segmentation optimization algorithm. Finally, considering the problem of poor quality of 3D abdominal fake images generated by utilizing random noise as input, deep convolutional neural networks (DCNN) based on feature restoration method is designed to generate more realistic fake images. By testing the proposed algorithm on the LiTS-2017 and KiTS19 dataset, experimental results show that the proposed semi-supervised 3D liver segmentation method can greatly improve the segmentation performance of liver, with a Dice score of 0.9424 outperforming other methods.

Get full-text (via PubEx)

Multi-Turn Chatbot Based on Query-Context Attentions and Dual Wasserstein Generative Adversarial Networks

Applied Sciences ◽

10.3390/app9183908 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3908 ◽

Cited By ~ 3

Author(s):

Jintae Kim ◽

Shinhyeok Oh ◽

Oh-Woog Kwon ◽

Harksoo Kim

Keyword(s):

Performance Measures ◽

State Of The Art ◽

Attention Mechanism ◽

Generative Adversarial Networks ◽

Training Method ◽

Adversarial Networks ◽

Proposed Model ◽

Previous State ◽

Vector Representations

To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures.

Get full-text (via PubEx)

Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5649 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2645-2652 ◽

Cited By ~ 2

Author(s):

Yaman Kumar ◽

Dhruva Sahrawat ◽

Shubham Maheshwari ◽

Debanjan Mahata ◽

Amanda Stent ◽

...

Keyword(s):

Speech Recognition ◽

Classification Problem ◽

Visual Speech ◽

Training Data ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Novel Approach ◽

Visual Speech Recognition ◽

Training Samples ◽

English Training

Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code1.

Get full-text (via PubEx)

CAGAN: Consistent Adversarial Training Enhanced GANs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/359 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yao Ni ◽

Dandan Song ◽

Xi Zhang ◽

Hao Wu ◽

Lejian Liao

Keyword(s):

Neural Network ◽

Parameter Space ◽

Supervised Classification ◽

State Of The Art ◽

Generative Adversarial Networks ◽

Image Generation ◽

Real Samples ◽

Adversarial Networks ◽

Novel Approach ◽

Adversarial Training

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.

Get full-text (via PubEx)

Deep image synthesis from intuitive user input: A review and perspectives

Computational Visual Media ◽

10.1007/s41095-021-0234-8 ◽

2021 ◽

Vol 8 (1) ◽

pp. 3-31

Author(s):

Yuan Xue ◽

Yuan-Chen Guo ◽

Han Zhang ◽

Tao Xu ◽

Song-Hai Zhang ◽

...

Keyword(s):

Image Synthesis ◽

Generative Models ◽

Generative Adversarial Networks ◽

Image Generation ◽

Art And Design ◽

User Input ◽

Adversarial Networks ◽

Benchmark Datasets ◽

Deep Image ◽

Realistic Images

AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

Get full-text (via PubEx)

Combining Variational Autoencoders & Generative Adversarial Networks to Improve Image Quality

10.31219/osf.io/8bmdu ◽

2019 ◽

Author(s):

Atin Sakkeer Hussain

Keyword(s):

Image Quality ◽

Random Noise ◽

Training Data ◽

Generative Adversarial Networks ◽

Improve Image Quality ◽

Adversarial Networks ◽

Proposed Model ◽

Variational Autoencoder ◽

Proper Training ◽

Better Than

Generative Adversarial Networks(GAN) are trained to generate images from random noise vectors, but often these images turn out poorly due to any of several reasons such as model collapse, lack of proper training data, lack of training, etc. To combat this issue this paper, makes use of a Variational Autoencoder(VAE). The VAE is trained on a combination of the training & generated data, after this the VAE can be used to map images generated by the GAN to better versions of it. (This is similar to Denoising, but with few variations in the image). In addition to improving quality the proposed model is shown to work better than normal WGAN’s on sparse datasets with higher variety, in equal number of training epochs.

Get full-text (via PubEx)

SARA-GAN: Self-Attention and Relative Average Discriminator Based Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction

Frontiers in Neuroinformatics ◽

10.3389/fninf.2020.611666 ◽

2020 ◽

Vol 14 ◽

Cited By ~ 1

Author(s):

Zhenmou Yuan ◽

Mingfeng Jiang ◽

Yaming Wang ◽

Bo Wei ◽

Yongming Li ◽

...

Keyword(s):

Signal To Noise Ratio ◽

Similarity Index ◽

Structural Similarity ◽

Attention Mechanism ◽

Generative Adversarial Networks ◽

Reconstruction Method ◽

Mri Imaging ◽

Adversarial Networks ◽

Reconstruction Methods ◽

Mri Reconstruction

Research on undersampled magnetic resonance image (MRI) reconstruction can increase the speed of MRI imaging and reduce patient suffering. In this paper, an undersampled MRI reconstruction method based on Generative Adversarial Networks with the Self-Attention mechanism and the Relative Average discriminator (SARA-GAN) is proposed. In our SARA-GAN, the relative average discriminator theory is applied to make full use of the prior knowledge, in which half of the input data of the discriminator is true and half is fake. At the same time, a self-attention mechanism is incorporated into the high-layer of the generator to build long-range dependence of the image, which can overcome the problem of limited convolution kernel size. Besides, spectral normalization is employed to stabilize the training process. Compared with three widely used GAN-based MRI reconstruction methods, i.e., DAGAN, DAWGAN, and DAWGAN-GP, the proposed method can obtain a higher peak signal-to-noise ratio (PSNR) and structural similarity index measure(SSIM), and the details of the reconstructed image are more abundant and more realistic for further clinical scrutinization and diagnostic tasks.

Get full-text (via PubEx)

Conditional Deep 3D-Convolutional Generative Adversarial Nets for RGB-D Generation

Mathematical Problems in Engineering ◽

10.1155/2021/8358314 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Richa Sharma ◽

Manoj Sharma ◽

Ankit Shukla ◽

Santanu Chaudhury

Keyword(s):

Object Tracking ◽

Action Recognition ◽

Synthetic Data ◽

Depth Map ◽

Generative Adversarial Networks ◽

Data Generation ◽

Adversarial Networks ◽

Proposed Model ◽

Spatio Temporal ◽

Class Labels

Generation of synthetic data is a challenging task. There are only a few significant works on RGB video generation and no pertinent works on RGB-D data generation. In the present work, we focus our attention on synthesizing RGB-D data which can further be used as dataset for various applications like object tracking, gesture recognition, and action recognition. This paper has put forward a proposal for a novel architecture that uses conditional deep 3D-convolutional generative adversarial networks to synthesize RGB-D data by exploiting 3D spatio-temporal convolutional framework. The proposed architecture can be used to generate virtually unlimited data. In this work, we have presented the architecture to generate RGB-D data conditioned on class labels. In the architecture, two parallel paths were used, one to generate RGB data and the second to synthesize depth map. The output from the two parallel paths is combined to generate RGB-D data. The proposed model is used for video generation at 30 fps (frames per second). The frame referred here is an RGB-D with the spatial resolution of 512 × 512.

Get full-text (via PubEx)

A COMPARATIVE ANALYSIS OF UNSUPERVISED AND SEMI-SUPERVISED REPRESENTATION LEARNING FOR REMOTE SENSING IMAGE CATEGORIZATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-2-w7-167-2019 ◽

2019 ◽

Vol IV-2/W7 ◽

pp. 167-173 ◽

Cited By ~ 1

Author(s):

P. J. Soto ◽

J. D. Bermudez ◽

P. N. Happ ◽

R. Q. Feitosa

Keyword(s):

Remote Sensing ◽

Representation Learning ◽

Fine Tuning ◽

Generative Adversarial Networks ◽

Image Categorization ◽

Absolute Value ◽

Adversarial Networks ◽

Novel Approach ◽

Public Datasets ◽

The Impact

Abstract. This work aims at investigating unsupervised and semi-supervised representation learning methods based on generative adversarial networks for remote sensing scene classification. The work introduces a novel approach, which consists in a semi-supervised extension of a prior unsupervised method, known as MARTA-GAN. The proposed approach was compared experimentally with two baselines upon two public datasets, UC-MERCED and NWPU-RESISC45. The experiments assessed the performance of each approach under different amounts of labeled data. The impact of fine-tuning was also investigated. The proposed method delivered in our analysis the best overall accuracy under scarce labeled samples, both in terms of absolute value and in terms of variability across multiple runs.

Get full-text (via PubEx)