Utilizing Amari-Alpha Divergence to Stabilize the Training of Generative Adversarial Networks

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.

Download Full-text

Fractional Wavelet-Based Generative Scattering Networks

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.752752 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jiasong Wu ◽

Xiang Qiu ◽

Jing Zhang ◽

Fuzhi Wu ◽

Youyong Kong ◽

...

Keyword(s):

Dimensionality Reduction ◽

Reduction Method ◽

Gaussian White Noise ◽

Principal Component ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Adversarial Networks ◽

Dimensionality Reduction Method

Generative adversarial networks and variational autoencoders (VAEs) provide impressive image generation from Gaussian white noise, but both are difficult to train, since they need a generator (or encoder) and a discriminator (or decoder) to be trained simultaneously, which can easily lead to unstable training. To solve or alleviate these synchronous training problems of generative adversarial networks (GANs) and VAEs, researchers recently proposed generative scattering networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate an image. The advantage of GSNs is that the parameters of ScatNets do not need to be learned, while the disadvantage of GSNs is that their ability to obtain representations of ScatNets is slightly weaker than that of CNNs. In addition, the dimensionality reduction method of principal component analysis (PCA) can easily lead to overfitting in the training of GSNs and, therefore, affect the quality of generated images in the testing process. To further improve the quality of generated images while keeping the advantages of GSNs, this study proposes generative fractional scattering networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets), instead of ScatNets as the encoder to obtain features (or FrScatNet embeddings) and use similar CNNs of GSNs as the decoder to generate an image. Additionally, this study develops a new dimensionality reduction method named feature-map fusion (FMF) instead of performing PCA to better retain the information of FrScatNets,; it also discusses the effect of image fusion on the quality of the generated image. The experimental results obtained on the CIFAR-10 and CelebA datasets show that the proposed GFRSNs can lead to better generated images than the original GSNs on testing datasets. The experimental results of the proposed GFRSNs with deep convolutional GAN (DCGAN), progressive GAN (PGAN), and CycleGAN are also given.

Download Full-text

Multi-Turn Chatbot Based on Query-Context Attentions and Dual Wasserstein Generative Adversarial Networks

Applied Sciences ◽

10.3390/app9183908 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3908 ◽

Cited By ~ 3

Author(s):

Jintae Kim ◽

Shinhyeok Oh ◽

Oh-Woog Kwon ◽

Harksoo Kim

Keyword(s):

Performance Measures ◽

State Of The Art ◽

Attention Mechanism ◽

Generative Adversarial Networks ◽

Training Method ◽

Adversarial Networks ◽

Proposed Model ◽

Previous State ◽

Vector Representations

To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures.

Download Full-text

Reconstruction of Generative Adversarial Networks in Cross Modal Image Generation with Canonical Polyadic Decomposition

Wireless Communications and Mobile Computing ◽

10.1155/2021/8868781 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ruixin Ma ◽

Junying Lou ◽

Peng Li ◽

Jing Gao

Keyword(s):

Mobile Terminal ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Adversarial Networks ◽

Speed Up ◽

Canonical Polyadic Decomposition

Generating pictures from text is an interesting, classic, and challenging task. Benefited from the development of generative adversarial networks (GAN), the generation quality of this task has been greatly improved. Many excellent cross modal GAN models have been put forward. These models add extensive layers and constraints to get impressive generation pictures. However, complexity and computation of existing cross modal GANs are too high to be deployed in mobile terminal. To solve this problem, this paper designs a compact cross modal GAN based on canonical polyadic decomposition. We replace an original convolution layer with three small convolution layers and use an autoencoder to stabilize and speed up training. The experimental results show that our model achieves 20% times of compression in both parameters and FLOPs without loss of quality on generated images.

Download Full-text

CAGAN: Consistent Adversarial Training Enhanced GANs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/359 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yao Ni ◽

Dandan Song ◽

Xi Zhang ◽

Hao Wu ◽

Lejian Liao

Keyword(s):

Neural Network ◽

Parameter Space ◽

Supervised Classification ◽

State Of The Art ◽

Generative Adversarial Networks ◽

Image Generation ◽

Real Samples ◽

Adversarial Networks ◽

Novel Approach ◽

Adversarial Training

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.

Download Full-text

Successive Image Generation from a Single Sentence

ITM Web of Conferences ◽

10.1051/itmconf/20214003017 ◽

2021 ◽

Vol 40 ◽

pp. 03017

Author(s):

Amogh Parab ◽

Ananya Malik ◽

Arish Damania ◽

Arnav Parekhji ◽

Pranit Bari

Keyword(s):

State Of The Art ◽

Image Synthesis ◽

Image Sequence ◽

Generative Adversarial Networks ◽

Image Generation ◽

Single Sentence ◽

Adversarial Networks ◽

Successive Image ◽

Diagrammatic Representations ◽

Transfer Of Information

Through various examples in history such as the early man’s carving on caves, dependence on diagrammatic representations, the immense popularity of comic books we have seen that vision has a higher reach in communication than written words. In this paper, we analyse and propose a new task of transfer of information from text to image synthesis. Through this paper we aim to generate a story from a single sentence and convert our generated story into a sequence of images. We plan to use state of the art technology to implement this task. With the advent of Generative Adversarial Networks text to image synthesis have found a new awakening. We plan to take this task a step further, in order to automate the entire process. Our system generates a multi-lined story given a single sentence using a deep neural network. This story is then fed into our networks of multiple stage GANs inorder to produce a photorealistic image sequence.

Download Full-text

Black-Box Diagnosis and Calibration on GAN Intra-Mode Collapse: A Pilot Study

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3472768 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-18

Author(s):

Zhenyu Wu ◽

Zhaowen Wang ◽

Ye Yuan ◽

Jianming Zhang ◽

Zhangyang Wang ◽

...

Keyword(s):

State Of The Art ◽

Black Box ◽

Training Data ◽

Generative Adversarial Networks ◽

Small Scale ◽

Model Parameters ◽

Original Training ◽

Image Generation ◽

Adversarial Networks ◽

Calibration Techniques

Generative adversarial networks (GANs) nowadays are capable of producing images of incredible realism. Two concerns raised are whether the state-of-the-art GAN’s learned distribution still suffers from mode collapse and what to do if so. Existing diversity tests of samples from GANs are usually conducted qualitatively on a small scale and/or depend on the access to original training data as well as the trained model parameters. This article explores GAN intra-mode collapse and calibrates that in a novel black-box setting: access to neither training data nor the trained model parameters is assumed. The new setting is practically demanded yet rarely explored and significantly more challenging. As a first stab, we devise a set of statistical tools based on sampling that can visualize, quantify, and rectify intra-mode collapse . We demonstrate the effectiveness of our proposed diagnosis and calibration techniques, via extensive simulations and experiments, on unconditional GAN image generation (e.g., face and vehicle). Our study reveals that the intra-mode collapse is still a prevailing problem in state-of-the-art GANs and the mode collapse is diagnosable and calibratable in black-box settings. Our codes are available at https://github.com/VITA-Group/BlackBoxGANCollapse .

Download Full-text

CPGAN : An Efficient Architecture Designing for Text-to-Image Generative Adversarial Networks Based on Canonical Polyadic Decomposition

Scientific Programming ◽

10.1155/2021/5573751 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ruixin Ma ◽

Junying Lou

Keyword(s):

Mobile Applications ◽

Image Synthesis ◽

Generative Adversarial Networks ◽

Training Process ◽

Adversarial Networks ◽

Canonical Polyadic Decomposition ◽

The Stability ◽

Meaningful Text ◽

General Method

Text-to-image synthesis is an important and challenging application of computer vision. Many interesting and meaningful text-to-image synthesis models have been put forward. However, most of the works pay attention to the quality of synthesis images, but rarely consider the size of these models. Large models contain many parameters and high delay, which makes it difficult to be deployed on mobile applications. To solve this problem, we propose an efficient architecture CPGAN for text-to-image generative adversarial networks (GAN) based on canonical polyadic decomposition (CPD). It is a general method to design the lightweight architecture of text-to-image GAN. To improve the stability of CPGAN, we introduce conditioning augmentation and the idea of autoencoder during the training process. Experimental results prove that our architecture CPGAN can maintain the quality of generated images and reduce at least 20% parameters and flops.

Download Full-text

Perceptual metric-guided human image generation

Integrated Computer-Aided Engineering ◽

10.3233/ica-210672 ◽

2021 ◽

pp. 1-11

Author(s):

Haoran Wu ◽

Fazhi He ◽

Yansong Duan ◽

Xiaohu Yan

Keyword(s):

State Of The Art ◽

Transfer Task ◽

Generative Adversarial Networks ◽

Perceptual Quality ◽

Image Generation ◽

Migration Process ◽

Adversarial Networks ◽

Human Image ◽

Detection Score ◽

Perceptual Metrics

Pose transfer, which synthesizes a new image of a target person in a novel pose, is valuable in several applications. Generative adversarial networks (GAN) based pose transfer is a new way for person re-identification (re-ID). Typical perceptual metrics, like Detection Score (DS) and Inception Score (IS), were employed to assess the visual quality after generation in pose transfer task. Thus, the existing GAN-based methods do not directly benefit from these metrics which are highly associated with human ratings. In this paper, a perceptual metrics guided GAN (PIGGAN) framework is proposed to intrinsically optimize generation processing for pose transfer task. Specifically, a novel and general model-Evaluator that matches well the GAN is designed. Accordingly, a new Sort Loss (SL) is constructed to optimize the perceptual quality. Morevover, PIGGAN is highly flexible and extensible and can incorporate both differentiable and indifferentiable indexes to optimize the attitude migration process. Extensive experiments show that PIGGAN can generate photo-realistic results and quantitatively outperforms state-of-the-art (SOTA) methods.

Download Full-text

Content-Based Attention Network for Person Image Generation

Journal of Circuits System and Computers ◽

10.1142/s0218126620502503 ◽

2020 ◽

Vol 29 (15) ◽

pp. 2050250

Author(s):

Xiongfei Liu ◽

Bengao Li ◽

Xin Chen ◽

Haiyan Zhang ◽

Shu Zhan

Keyword(s):

Major Part ◽

State Of The Art ◽

Attention Mechanism ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Attention Network ◽

Adversarial Networks ◽

Proposed Model ◽

Novel Method

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.

Download Full-text

WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation

Electronics ◽

10.3390/electronics10030275 ◽

2021 ◽

Vol 10 (3) ◽

pp. 275

Author(s):

Ziyun Jiao ◽

Fuji Ren

Keyword(s):

Loss Function ◽

State Of The Art ◽

Wasserstein Distance ◽

Generative Adversarial Networks ◽

Image Generation ◽

Text Generation ◽

Adversarial Networks ◽

Slow Progress ◽

Novel Method ◽

Public Datasets

Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method.

Download Full-text