scholarly journals WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 275
Author(s):  
Ziyun Jiao ◽  
Fuji Ren

Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method.

2020 ◽  
Vol 29 (15) ◽  
pp. 2050250
Author(s):  
Xiongfei Liu ◽  
Bengao Li ◽  
Xin Chen ◽  
Haiyan Zhang ◽  
Shu Zhan

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.


Author(s):  
Yao Ni ◽  
Dandan Song ◽  
Xi Zhang ◽  
Hao Wu ◽  
Lejian Liao

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.


2021 ◽  
Vol 40 ◽  
pp. 03017
Author(s):  
Amogh Parab ◽  
Ananya Malik ◽  
Arish Damania ◽  
Arnav Parekhji ◽  
Pranit Bari

Through various examples in history such as the early man’s carving on caves, dependence on diagrammatic representations, the immense popularity of comic books we have seen that vision has a higher reach in communication than written words. In this paper, we analyse and propose a new task of transfer of information from text to image synthesis. Through this paper we aim to generate a story from a single sentence and convert our generated story into a sequence of images. We plan to use state of the art technology to implement this task. With the advent of Generative Adversarial Networks text to image synthesis have found a new awakening. We plan to take this task a step further, in order to automate the entire process. Our system generates a multi-lined story given a single sentence using a deep neural network. This story is then fed into our networks of multiple stage GANs inorder to produce a photorealistic image sequence.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 410 ◽  
Author(s):  
Likun Cai ◽  
Yanjie Chen ◽  
Ning Cai ◽  
Wei Cheng ◽  
Hao Wang

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.


Author(s):  
Zhenyu Wu ◽  
Zhaowen Wang ◽  
Ye Yuan ◽  
Jianming Zhang ◽  
Zhangyang Wang ◽  
...  

Generative adversarial networks (GANs) nowadays are capable of producing images of incredible realism. Two concerns raised are whether the state-of-the-art GAN’s learned distribution still suffers from mode collapse and what to do if so. Existing diversity tests of samples from GANs are usually conducted qualitatively on a small scale and/or depend on the access to original training data as well as the trained model parameters. This article explores GAN intra-mode collapse and calibrates that in a novel black-box setting: access to neither training data nor the trained model parameters is assumed. The new setting is practically demanded yet rarely explored and significantly more challenging. As a first stab, we devise a set of statistical tools based on sampling that can visualize, quantify, and rectify intra-mode collapse . We demonstrate the effectiveness of our proposed diagnosis and calibration techniques, via extensive simulations and experiments, on unconditional GAN image generation (e.g., face and vehicle). Our study reveals that the intra-mode collapse is still a prevailing problem in state-of-the-art GANs and the mode collapse is diagnosable and calibratable in black-box settings. Our codes are available at https://github.com/VITA-Group/BlackBoxGANCollapse .


2021 ◽  
pp. 1-11
Author(s):  
Haoran Wu ◽  
Fazhi He ◽  
Yansong Duan ◽  
Xiaohu Yan

Pose transfer, which synthesizes a new image of a target person in a novel pose, is valuable in several applications. Generative adversarial networks (GAN) based pose transfer is a new way for person re-identification (re-ID). Typical perceptual metrics, like Detection Score (DS) and Inception Score (IS), were employed to assess the visual quality after generation in pose transfer task. Thus, the existing GAN-based methods do not directly benefit from these metrics which are highly associated with human ratings. In this paper, a perceptual metrics guided GAN (PIGGAN) framework is proposed to intrinsically optimize generation processing for pose transfer task. Specifically, a novel and general model-Evaluator that matches well the GAN is designed. Accordingly, a new Sort Loss (SL) is constructed to optimize the perceptual quality. Morevover, PIGGAN is highly flexible and extensible and can incorporate both differentiable and indifferentiable indexes to optimize the attitude migration process. Extensive experiments show that PIGGAN can generate photo-realistic results and quantitatively outperforms state-of-the-art (SOTA) methods.


2020 ◽  
Vol 10 (8) ◽  
pp. 2780
Author(s):  
Byung-Gil Han ◽  
Jong Taek Lee ◽  
Kil-Taek Lim ◽  
Doo-Hyun Choi

License Plate Character Recognition (LPCR) is a technology for reading vehicle registration plates using optical character recognition from images and videos, and it has a long history due to its usefulness. While LPCR has been significantly improved with the advance of deep learning, training deep networks for LPCR module requires a large number of license plate (LP) images and their annotations. Unlike other public datasets of vehicle information, each LP has a unique combination of characters and numbers depending on the country or the region. Therefore, collecting a sufficient number of LP images is extremely difficult for normal research. In this paper, we propose LP-GAN, an LP image generation method, by applying an ensemble of generative adversarial networks (GAN), and we also propose a modified lightweight YOLOv2 model for an efficient end-to-end LPCR module. With only 159 real LP images available online, thousands of synthetic LP images were generated by using LP-GAN. The generated images not only looked similar to real ones, but they were also shown to be effective for training the LPCR module. As a result of performance tests with 22,117 real LP images, the LPCR module trained with only the generated synthetic dataset achieved 98.72% overall accuracy, which is comparable to that of training with a real LP image dataset. In addition, we improved the processing speed of LPCR about 1.7 times faster than that of the original YOLOv2 model by using the proposed lightweight model.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 325
Author(s):  
Ángel González-Prieto ◽  
Alberto Mozo ◽  
Edgar Talavera ◽  
Sandra Gómez-Canaval

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.


2021 ◽  
Vol 11 (4) ◽  
pp. 1380
Author(s):  
Yingbo Zhou ◽  
Pengcheng Zhao ◽  
Weiqin Tong ◽  
Yongxin Zhu

While Generative Adversarial Networks (GANs) have shown promising performance in image generation, they suffer from numerous issues such as mode collapse and training instability. To stabilize GAN training and improve image synthesis quality with diversity, we propose a simple yet effective approach as Contrastive Distance Learning GAN (CDL-GAN) in this paper. Specifically, we add Consistent Contrastive Distance (CoCD) and Characteristic Contrastive Distance (ChCD) into a principled framework to improve GAN performance. The CoCD explicitly maximizes the ratio of the distance between generated images and the increment between noise vectors to strengthen image feature learning for the generator. The ChCD measures the sampling distance of the encoded images in Euler space to boost feature representations for the discriminator. We model the framework by employing Siamese Network as a module into GANs without any modification on the backbone. Both qualitative and quantitative experiments conducted on three public datasets demonstrate the effectiveness of our method.


Sign in / Sign up

Export Citation Format

Share Document