scholarly journals Successive Image Generation from a Single Sentence

2021 ◽  
Vol 40 ◽  
pp. 03017
Author(s):  
Amogh Parab ◽  
Ananya Malik ◽  
Arish Damania ◽  
Arnav Parekhji ◽  
Pranit Bari

Through various examples in history such as the early man’s carving on caves, dependence on diagrammatic representations, the immense popularity of comic books we have seen that vision has a higher reach in communication than written words. In this paper, we analyse and propose a new task of transfer of information from text to image synthesis. Through this paper we aim to generate a story from a single sentence and convert our generated story into a sequence of images. We plan to use state of the art technology to implement this task. With the advent of Generative Adversarial Networks text to image synthesis have found a new awakening. We plan to take this task a step further, in order to automate the entire process. Our system generates a multi-lined story given a single sentence using a deep neural network. This story is then fed into our networks of multiple stage GANs inorder to produce a photorealistic image sequence.


Author(s):  
Yao Ni ◽  
Dandan Song ◽  
Xi Zhang ◽  
Hao Wu ◽  
Lejian Liao

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.



2021 ◽  
Vol 8 (1) ◽  
pp. 3-31
Author(s):  
Yuan Xue ◽  
Yuan-Chen Guo ◽  
Han Zhang ◽  
Tao Xu ◽  
Song-Hai Zhang ◽  
...  

AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.



Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 410 ◽  
Author(s):  
Likun Cai ◽  
Yanjie Chen ◽  
Ning Cai ◽  
Wei Cheng ◽  
Hao Wang

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.



Author(s):  
Zhenyu Wu ◽  
Zhaowen Wang ◽  
Ye Yuan ◽  
Jianming Zhang ◽  
Zhangyang Wang ◽  
...  

Generative adversarial networks (GANs) nowadays are capable of producing images of incredible realism. Two concerns raised are whether the state-of-the-art GAN’s learned distribution still suffers from mode collapse and what to do if so. Existing diversity tests of samples from GANs are usually conducted qualitatively on a small scale and/or depend on the access to original training data as well as the trained model parameters. This article explores GAN intra-mode collapse and calibrates that in a novel black-box setting: access to neither training data nor the trained model parameters is assumed. The new setting is practically demanded yet rarely explored and significantly more challenging. As a first stab, we devise a set of statistical tools based on sampling that can visualize, quantify, and rectify intra-mode collapse . We demonstrate the effectiveness of our proposed diagnosis and calibration techniques, via extensive simulations and experiments, on unconditional GAN image generation (e.g., face and vehicle). Our study reveals that the intra-mode collapse is still a prevailing problem in state-of-the-art GANs and the mode collapse is diagnosable and calibratable in black-box settings. Our codes are available at https://github.com/VITA-Group/BlackBoxGANCollapse .



2021 ◽  
pp. 1-11
Author(s):  
Haoran Wu ◽  
Fazhi He ◽  
Yansong Duan ◽  
Xiaohu Yan

Pose transfer, which synthesizes a new image of a target person in a novel pose, is valuable in several applications. Generative adversarial networks (GAN) based pose transfer is a new way for person re-identification (re-ID). Typical perceptual metrics, like Detection Score (DS) and Inception Score (IS), were employed to assess the visual quality after generation in pose transfer task. Thus, the existing GAN-based methods do not directly benefit from these metrics which are highly associated with human ratings. In this paper, a perceptual metrics guided GAN (PIGGAN) framework is proposed to intrinsically optimize generation processing for pose transfer task. Specifically, a novel and general model-Evaluator that matches well the GAN is designed. Accordingly, a new Sort Loss (SL) is constructed to optimize the perceptual quality. Morevover, PIGGAN is highly flexible and extensible and can incorporate both differentiable and indifferentiable indexes to optimize the attitude migration process. Extensive experiments show that PIGGAN can generate photo-realistic results and quantitatively outperforms state-of-the-art (SOTA) methods.



In the recent past, text-to-image translation was an active field of research. The ability of a network to know a sentence's context and to create a specific picture that represents the sentence demonstrates the model's ability to think more like humans. Common text--translation methods employ Generative Adversarial Networks to generate high-text-images, but the images produced do not always represent the meaning of the phrase provided to the model as input. Using a captioning network to caption generated images, we tackle this problem and exploit the gap between ground truth captions and generated captions to further enhance the network. We present detailed similarities between our system and the methods already in place. Text-to-Image synthesis is a difficult problem with plenty of space for progress despite the current state-of - the-art results. Synthesized images from current methods give the described image a rough sketch but do not capture the true essence of what the text describes. The re-penny achievement of Generative Adversarial Networks (GANs) demonstrates that they are a decent contender for the decision of design to move toward this issue.



2020 ◽  
Vol 34 (07) ◽  
pp. 10981-10988
Author(s):  
Mengxiao Hu ◽  
Jinlong Li ◽  
Maolin Hu ◽  
Tao Hu

In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes. To prevent this happen, we proposed a hierarchical mode exploring method to alleviate mode collapse in cGANs by introducing a diversity measurement into the objective function as the regularization term. We also introduced the Expected Ratios of Expansion (ERE) into the regularization term, by minimizing the sum of differences between the real change of distance and ERE, we can control the diversity of generated images w.r.t specific-level features. We validated the proposed algorithm on four conditional image synthesis tasks including categorical generation, paired and un-paired image translation and text-to-image generation. Both qualitative and quantitative results show that the proposed method is effective in alleviating the mode collapse problem in cGANs, and can control the diversity of output images w.r.t specific-level features.



Author(s):  
Run Wang ◽  
Felix Juefei-Xu ◽  
Lei Ma ◽  
Xiaofei Xie ◽  
Yihao Huang ◽  
...  

In recent years, generative adversarial networks (GANs) and its variants have achieved unprecedented success in image synthesis. They are widely adopted in synthesizing facial images which brings potential security concerns to humans as the fakes spread and fuel the misinformation. However, robust detectors of these AI-synthesized fake faces are still in their infancy and are not ready to fully tackle this emerging challenge. In this work, we propose a novel approach, named FakeSpotter, based on monitoring neuron behaviors to spot AI-synthesized fake faces. The studies on neuron coverage and interactions have successfully shown that they can be served as testing criteria for deep learning systems, especially under the settings of being exposed to adversarial attacks. Here, we conjecture that monitoring neuron behavior can also serve as an asset in detecting fake faces since layer-by-layer neuron activation patterns may capture more subtle features that are important for the fake detector. Experimental results on detecting four types of fake faces synthesized with the state-of-the-art GANs and evading four perturbation attacks show the effectiveness and robustness of our approach.



2020 ◽  
Vol 29 (15) ◽  
pp. 2050250
Author(s):  
Xiongfei Liu ◽  
Bengao Li ◽  
Xin Chen ◽  
Haiyan Zhang ◽  
Shu Zhan

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.



Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 275
Author(s):  
Ziyun Jiao ◽  
Fuji Ren

Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method.



Sign in / Sign up

Export Citation Format

Share Document