scholarly journals A SURVEY OF METHODS OF TEXT-TO-IMAGE TRANSLATION

2019 ◽  
Vol 2 (93) ◽  
pp. 64-68
Author(s):  
I. Konarieva ◽  
D. Pydorenko ◽  
O. Turuta

The given work considers the existing methods of text compression (finding keywords or creating summary) using RAKE, Lex Rank, Luhn, LSA, Text Rank algorithms; image generation; text-to-image and image-to-image translation including GANs (generative adversarial networks). Different types of GANs were described such as StyleGAN, GauGAN, Pix2Pix, CycleGAN, BigGAN, AttnGAN. This work aims to show ways to create illustrations for the text. First, key information should be obtained from the text. Second, this key information should be transformed into images. There were proposed several ways to transform keywords to images: generating images or selecting them from a dataset with further transforming like generating new images based on selected ow combining selected images e.g. with applying style from one image to another. Based on results, possibilities for further improving the quality of image generation were also planned: combining image generation with selecting images from a dataset, limiting topics of image generation.

2021 ◽  
Vol 15 ◽  
Author(s):  
Jiasong Wu ◽  
Xiang Qiu ◽  
Jing Zhang ◽  
Fuzhi Wu ◽  
Youyong Kong ◽  
...  

Generative adversarial networks and variational autoencoders (VAEs) provide impressive image generation from Gaussian white noise, but both are difficult to train, since they need a generator (or encoder) and a discriminator (or decoder) to be trained simultaneously, which can easily lead to unstable training. To solve or alleviate these synchronous training problems of generative adversarial networks (GANs) and VAEs, researchers recently proposed generative scattering networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate an image. The advantage of GSNs is that the parameters of ScatNets do not need to be learned, while the disadvantage of GSNs is that their ability to obtain representations of ScatNets is slightly weaker than that of CNNs. In addition, the dimensionality reduction method of principal component analysis (PCA) can easily lead to overfitting in the training of GSNs and, therefore, affect the quality of generated images in the testing process. To further improve the quality of generated images while keeping the advantages of GSNs, this study proposes generative fractional scattering networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets), instead of ScatNets as the encoder to obtain features (or FrScatNet embeddings) and use similar CNNs of GSNs as the decoder to generate an image. Additionally, this study develops a new dimensionality reduction method named feature-map fusion (FMF) instead of performing PCA to better retain the information of FrScatNets,; it also discusses the effect of image fusion on the quality of the generated image. The experimental results obtained on the CIFAR-10 and CelebA datasets show that the proposed GFRSNs can lead to better generated images than the original GSNs on testing datasets. The experimental results of the proposed GFRSNs with deep convolutional GAN (DCGAN), progressive GAN (PGAN), and CycleGAN are also given.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Ruixin Ma ◽  
Junying Lou ◽  
Peng Li ◽  
Jing Gao

Generating pictures from text is an interesting, classic, and challenging task. Benefited from the development of generative adversarial networks (GAN), the generation quality of this task has been greatly improved. Many excellent cross modal GAN models have been put forward. These models add extensive layers and constraints to get impressive generation pictures. However, complexity and computation of existing cross modal GANs are too high to be deployed in mobile terminal. To solve this problem, this paper designs a compact cross modal GAN based on canonical polyadic decomposition. We replace an original convolution layer with three small convolution layers and use an autoencoder to stabilize and speed up training. The experimental results show that our model achieves 20% times of compression in both parameters and FLOPs without loss of quality on generated images.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 410 ◽  
Author(s):  
Likun Cai ◽  
Yanjie Chen ◽  
Ning Cai ◽  
Wei Cheng ◽  
Hao Wang

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Zhiyi Cao ◽  
Shaozhang Niu ◽  
Jiwei Zhang

Generative Adversarial Networks (GANs) have achieved significant success in unsupervised image-to-image translation between given categories (e.g., zebras to horses). Previous GANs models assume that the shared latent space between different categories will be captured from the given categories. Unfortunately, besides the well-designed datasets from given categories, many examples come from different wild categories (e.g., cats to dogs) holding special shapes and sizes (short for adversarial examples), so the shared latent space is troublesome to capture, and it will cause the collapse of these models. For this problem, we assume the shared latent space can be classified as global and local and design a weakly supervised Similar GANs (Sim-GAN) to capture the local shared latent space rather than the global shared latent space. For the well-designed datasets, the local shared latent space is close to the global shared latent space. For the wild datasets, we will get the local shared latent space to stop the model from collapse. Experiments on four public datasets show that our model significantly outperforms state-of-the-art baseline methods.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-38
Author(s):  
Zhengwei Wang ◽  
Qi She ◽  
Tomás E. Ward

Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation, and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are as follows: (1) the generation of high quality images, (2) diversity of image generation, and (3) stabilizing training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state-of-the-art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress toward addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress toward critical computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Codes related to the GAN-variants studied in this work is summarized on https://github.com/sheqi/GAN_Review.


2020 ◽  
Vol 34 (07) ◽  
pp. 10981-10988
Author(s):  
Mengxiao Hu ◽  
Jinlong Li ◽  
Maolin Hu ◽  
Tao Hu

In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes. To prevent this happen, we proposed a hierarchical mode exploring method to alleviate mode collapse in cGANs by introducing a diversity measurement into the objective function as the regularization term. We also introduced the Expected Ratios of Expansion (ERE) into the regularization term, by minimizing the sum of differences between the real change of distance and ERE, we can control the diversity of generated images w.r.t specific-level features. We validated the proposed algorithm on four conditional image synthesis tasks including categorical generation, paired and un-paired image translation and text-to-image generation. Both qualitative and quantitative results show that the proposed method is effective in alleviating the mode collapse problem in cGANs, and can control the diversity of output images w.r.t specific-level features.


2018 ◽  
Vol 35 (12) ◽  
pp. 2141-2149 ◽  
Author(s):  
Hao Yuan ◽  
Lei Cai ◽  
Zhengyang Wang ◽  
Xia Hu ◽  
Shaoting Zhang ◽  
...  

Abstract Motivation Cellular function is closely related to the localizations of its sub-structures. It is, however, challenging to experimentally label all sub-cellular structures simultaneously in the same cell. This raises the need of building a computational model to learn the relationships among these sub-cellular structures and use reference structures to infer the localizations of other structures. Results We formulate such a task as a conditional image generation problem and propose to use conditional generative adversarial networks for tackling it. We employ an encoder–decoder network as the generator and propose to use skip connections between the encoder and decoder to provide spatial information to the decoder. To incorporate the conditional information in a variety of different ways, we develop three different types of skip connections, known as the self-gated connection, encoder-gated connection and label-gated connection. The proposed skip connections are built based on the conditional information using gating mechanisms. By learning a gating function, the network is able to control what information should be passed through the skip connections from the encoder to the decoder. Since the gate parameters are also learned automatically, we expect that only useful spatial information is transmitted to the decoder to help image generation. We perform both qualitative and quantitative evaluations to assess the effectiveness of our proposed approaches. Experimental results show that our cGAN-based approaches have the ability to generate the desired sub-cellular structures correctly. Our results also demonstrate that the proposed approaches outperform the existing approach based on adversarial auto-encoders, and the new skip connections lead to improved performance. In addition, the localizations of generated sub-cellular structures by our approaches are consistent with observations in biological experiments. Availability and implementation The source code and more results are available at https://github.com/divelab/cgan/.


2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Alexander Hepburn ◽  
Valero Laparra ◽  
Ryan McConville ◽  
Raul Santos-Rodriguez

In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional generative adversarial networks (cGANs). To do so automatically (avoiding visual inspection), we use the Normalised Laplacian Pyramid Distance (NLPD) to measure the perceptual similarity between the generated image and the original image. The NLPD is based on the principle of normalising the value of coefficients with respect to a local estimate of mean energy at different scales and has already been successfully tested in different experiments involving human perception. We compare this regulariser with the originally proposed L1 distance and note that when using NLPD the generated images contain more realistic values for both local and global contrast.


2020 ◽  
Vol 8 (6) ◽  
pp. 3492-3495

Mobile Photography has been brought to a significantly new level in the last several years. The quality of images taken by the compact lenses of a smartphone have now appreciably increased. Now, even some of the low- end phones of the market spectrum are able to take exceedingly good photos in suitable availability of lighting, due to the advancement in numerous software methods for processing the images post capture. However, despite these tools, these cam- eras still fall behind the aesthetic capabilities of their DSLR counterparts. In the quest to achieve high quality images through a smartphone camera, various image semantics are inadvertently ignored leading to a less artistic image quality than a pro- fessional camera. Although numerous techniques for manual as well as computerized image en- hancement tasks do exist, they are generally only focused on brightness or contrast and other such global parameters of the image and does not go on to improve the content or texture of the image and neither do they take the various semantics of the image into account. Moreover, they are usually based on a predetermined set of rules that never considers the actual device specifics that is capturing the image — the smartphone camera. For our enhancement, we have endeavored to use a unique deep learning technique to transform lower quality images from a smartphone camera into DSLRquality images. To enhance the image sharpness, we have used an error function that combines the three losses - the content, texture and color loss from the given image. By training on the large-scale DSLR Photo Enhancement Dataset, we have optimized the loss function using Generative Adversarial Networks. The end results produced after testing on a number of smartphone images yield enhanced quality images comparable to the DSLR images with an average SSIM score of approximately 0.95.


Sign in / Sign up

Export Citation Format

Share Document