WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation

Ziyun Jiao; Fuji Ren

doi:10.3390/electronics10030275

WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation

Electronics ◽

10.3390/electronics10030275 ◽

2021 ◽

Vol 10 (3) ◽

pp. 275

Author(s):

Ziyun Jiao ◽

Fuji Ren

Keyword(s):

Loss Function ◽

State Of The Art ◽

Wasserstein Distance ◽

Generative Adversarial Networks ◽

Image Generation ◽

Text Generation ◽

Adversarial Networks ◽

Slow Progress ◽

Novel Method ◽

Public Datasets

Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method.

Download Full-text

Content-Based Attention Network for Person Image Generation

Journal of Circuits System and Computers ◽

10.1142/s0218126620502503 ◽

2020 ◽

Vol 29 (15) ◽

pp. 2050250

Author(s):

Xiongfei Liu ◽

Bengao Li ◽

Xin Chen ◽

Haiyan Zhang ◽

Shu Zhan

Keyword(s):

Major Part ◽

State Of The Art ◽

Attention Mechanism ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Attention Network ◽

Adversarial Networks ◽

Proposed Model ◽

Novel Method

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.

Download Full-text

CAGAN: Consistent Adversarial Training Enhanced GANs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/359 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yao Ni ◽

Dandan Song ◽

Xi Zhang ◽

Hao Wu ◽

Lejian Liao

Keyword(s):

Neural Network ◽

Parameter Space ◽

Supervised Classification ◽

State Of The Art ◽

Generative Adversarial Networks ◽

Image Generation ◽

Real Samples ◽

Adversarial Networks ◽

Novel Approach ◽

Adversarial Training

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.

Download Full-text

Successive Image Generation from a Single Sentence

ITM Web of Conferences ◽

10.1051/itmconf/20214003017 ◽

2021 ◽

Vol 40 ◽

pp. 03017

Author(s):

Amogh Parab ◽

Ananya Malik ◽

Arish Damania ◽

Arnav Parekhji ◽

Pranit Bari

Keyword(s):

State Of The Art ◽

Image Synthesis ◽

Image Sequence ◽

Generative Adversarial Networks ◽

Image Generation ◽

Single Sentence ◽

Adversarial Networks ◽

Successive Image ◽

Diagrammatic Representations ◽

Transfer Of Information

Through various examples in history such as the early man’s carving on caves, dependence on diagrammatic representations, the immense popularity of comic books we have seen that vision has a higher reach in communication than written words. In this paper, we analyse and propose a new task of transfer of information from text to image synthesis. Through this paper we aim to generate a story from a single sentence and convert our generated story into a sequence of images. We plan to use state of the art technology to implement this task. With the advent of Generative Adversarial Networks text to image synthesis have found a new awakening. We plan to take this task a step further, in order to automate the entire process. Our system generates a multi-lined story given a single sentence using a deep neural network. This story is then fed into our networks of multiple stage GANs inorder to produce a photorealistic image sequence.

Download Full-text

Utilizing Amari-Alpha Divergence to Stabilize the Training of Generative Adversarial Networks

Entropy ◽

10.3390/e22040410 ◽

2020 ◽

Vol 22 (4) ◽

pp. 410 ◽

Cited By ~ 2

Author(s):

Likun Cai ◽

Yanjie Chen ◽

Ning Cai ◽

Wei Cheng ◽

Hao Wang

Keyword(s):

State Of The Art ◽

Generative Adversarial Networks ◽

Image Generation ◽

Significant Progress ◽

Trade Off ◽

Adversarial Networks ◽

Leibler Divergence ◽

The Stability ◽

Hellinger Divergence

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.

Download Full-text

Black-Box Diagnosis and Calibration on GAN Intra-Mode Collapse: A Pilot Study

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3472768 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-18

Author(s):

Zhenyu Wu ◽

Zhaowen Wang ◽

Ye Yuan ◽

Jianming Zhang ◽

Zhangyang Wang ◽

...

Keyword(s):

State Of The Art ◽

Black Box ◽

Training Data ◽

Generative Adversarial Networks ◽

Small Scale ◽

Model Parameters ◽

Original Training ◽

Image Generation ◽

Adversarial Networks ◽

Calibration Techniques

Generative adversarial networks (GANs) nowadays are capable of producing images of incredible realism. Two concerns raised are whether the state-of-the-art GAN’s learned distribution still suffers from mode collapse and what to do if so. Existing diversity tests of samples from GANs are usually conducted qualitatively on a small scale and/or depend on the access to original training data as well as the trained model parameters. This article explores GAN intra-mode collapse and calibrates that in a novel black-box setting: access to neither training data nor the trained model parameters is assumed. The new setting is practically demanded yet rarely explored and significantly more challenging. As a first stab, we devise a set of statistical tools based on sampling that can visualize, quantify, and rectify intra-mode collapse . We demonstrate the effectiveness of our proposed diagnosis and calibration techniques, via extensive simulations and experiments, on unconditional GAN image generation (e.g., face and vehicle). Our study reveals that the intra-mode collapse is still a prevailing problem in state-of-the-art GANs and the mode collapse is diagnosable and calibratable in black-box settings. Our codes are available at https://github.com/VITA-Group/BlackBoxGANCollapse .

Download Full-text

Perceptual metric-guided human image generation

Integrated Computer-Aided Engineering ◽

10.3233/ica-210672 ◽

2021 ◽

pp. 1-11

Author(s):

Haoran Wu ◽

Fazhi He ◽

Yansong Duan ◽

Xiaohu Yan

Keyword(s):

State Of The Art ◽

Transfer Task ◽

Generative Adversarial Networks ◽

Perceptual Quality ◽

Image Generation ◽

Migration Process ◽

Adversarial Networks ◽

Human Image ◽

Detection Score ◽

Perceptual Metrics

Pose transfer, which synthesizes a new image of a target person in a novel pose, is valuable in several applications. Generative adversarial networks (GAN) based pose transfer is a new way for person re-identification (re-ID). Typical perceptual metrics, like Detection Score (DS) and Inception Score (IS), were employed to assess the visual quality after generation in pose transfer task. Thus, the existing GAN-based methods do not directly benefit from these metrics which are highly associated with human ratings. In this paper, a perceptual metrics guided GAN (PIGGAN) framework is proposed to intrinsically optimize generation processing for pose transfer task. Specifically, a novel and general model-Evaluator that matches well the GAN is designed. Accordingly, a new Sort Loss (SL) is constructed to optimize the perceptual quality. Morevover, PIGGAN is highly flexible and extensible and can incorporate both differentiable and indifferentiable indexes to optimize the attitude migration process. Extensive experiments show that PIGGAN can generate photo-realistic results and quantitatively outperforms state-of-the-art (SOTA) methods.

Download Full-text

License Plate Image Generation using Generative Adversarial Networks for End-To-End License Plate Character Recognition from a Small Set of Real Images

Applied Sciences ◽

10.3390/app10082780 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2780

Author(s):

Byung-Gil Han ◽

Jong Taek Lee ◽

Kil-Taek Lim ◽

Doo-Hyun Choi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Generative Adversarial Networks ◽

License Plate ◽

Image Generation ◽

Adversarial Networks ◽

Vehicle Registration ◽

End To End ◽

Vehicle Information ◽

Public Datasets

License Plate Character Recognition (LPCR) is a technology for reading vehicle registration plates using optical character recognition from images and videos, and it has a long history due to its usefulness. While LPCR has been significantly improved with the advance of deep learning, training deep networks for LPCR module requires a large number of license plate (LP) images and their annotations. Unlike other public datasets of vehicle information, each LP has a unique combination of characters and numbers depending on the country or the region. Therefore, collecting a sufficient number of LP images is extremely difficult for normal research. In this paper, we propose LP-GAN, an LP image generation method, by applying an ensemble of generative adversarial networks (GAN), and we also propose a modified lightweight YOLOv2 model for an efficient end-to-end LPCR module. With only 159 real LP images available online, thousands of synthetic LP images were generated by using LP-GAN. The generated images not only looked similar to real ones, but they were also shown to be effective for training the LPCR module. As a result of performance tests with 22,117 real LP images, the LPCR module trained with only the generated synthetic dataset achieved 98.72% overall accuracy, which is comparable to that of training with a real LP image dataset. In addition, we improved the processing speed of LPCR about 1.7 times faster than that of the original YOLOv2 model by using the proposed lightweight model.

Download Full-text

Dynamics of Fourier Modes in Torus Generative Adversarial Networks

Mathematics ◽

10.3390/math9040325 ◽

2021 ◽

Vol 9 (4) ◽

pp. 325

Author(s):

Ángel González-Prieto ◽

Alberto Mozo ◽

Edgar Talavera ◽

Sandra Gómez-Canaval

Keyword(s):

Fourier Series ◽

Generative Adversarial Networks ◽

Learning Models ◽

Training Process ◽

Small Perturbations ◽

Adversarial Networks ◽

Novel Method ◽

Truncated Fourier Series ◽

Real Flow ◽

Machine Learning Models

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.

Download Full-text

S2I-Bird: Sound-to-Image Generation of Bird Species using Generative Adversarial Networks

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412721 ◽

2021 ◽

Author(s):

Joo Yong Shim ◽

Joongheon Kim ◽

Jong-Kook Kim

Keyword(s):

Bird Species ◽

Generative Adversarial Networks ◽

Image Generation ◽

Adversarial Networks

Download Full-text

CDL-GAN: Contrastive Distance Learning Generative Adversarial Network for Image Generation

Applied Sciences ◽

10.3390/app11041380 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1380

Author(s):

Yingbo Zhou ◽

Pengcheng Zhao ◽

Weiqin Tong ◽

Yongxin Zhu

Keyword(s):

Distance Learning ◽

Feature Learning ◽

Image Synthesis ◽

Image Feature ◽

Generative Adversarial Networks ◽

Image Generation ◽

Generative Adversarial Network ◽

Feature Representations ◽

Adversarial Network ◽

Public Datasets

While Generative Adversarial Networks (GANs) have shown promising performance in image generation, they suffer from numerous issues such as mode collapse and training instability. To stabilize GAN training and improve image synthesis quality with diversity, we propose a simple yet effective approach as Contrastive Distance Learning GAN (CDL-GAN) in this paper. Specifically, we add Consistent Contrastive Distance (CoCD) and Characteristic Contrastive Distance (ChCD) into a principled framework to improve GAN performance. The CoCD explicitly maximizes the ratio of the distance between generated images and the increment between noise vectors to strengthen image feature learning for the generator. The ChCD measures the sampling distance of the encoded images in Euler space to boost feature representations for the discriminator. We model the framework by employing Siamese Network as a module into GANs without any modification on the backbone. Both qualitative and quantitative experiments conducted on three public datasets demonstrate the effectiveness of our method.

Download Full-text