CPGAN : An Efficient Architecture Designing for Text-to-Image Generative Adversarial Networks Based on Canonical Polyadic Decomposition

Text-to-image synthesis is an important and challenging application of computer vision. Many interesting and meaningful text-to-image synthesis models have been put forward. However, most of the works pay attention to the quality of synthesis images, but rarely consider the size of these models. Large models contain many parameters and high delay, which makes it difficult to be deployed on mobile applications. To solve this problem, we propose an efficient architecture CPGAN for text-to-image generative adversarial networks (GAN) based on canonical polyadic decomposition (CPD). It is a general method to design the lightweight architecture of text-to-image GAN. To improve the stability of CPGAN, we introduce conditioning augmentation and the idea of autoencoder during the training process. Experimental results prove that our architecture CPGAN can maintain the quality of generated images and reduce at least 20% parameters and flops.

Download Full-text

Reconstruction of Generative Adversarial Networks in Cross Modal Image Generation with Canonical Polyadic Decomposition

Wireless Communications and Mobile Computing ◽

10.1155/2021/8868781 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ruixin Ma ◽

Junying Lou ◽

Peng Li ◽

Jing Gao

Keyword(s):

Mobile Terminal ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Adversarial Networks ◽

Speed Up ◽

Canonical Polyadic Decomposition

Generating pictures from text is an interesting, classic, and challenging task. Benefited from the development of generative adversarial networks (GAN), the generation quality of this task has been greatly improved. Many excellent cross modal GAN models have been put forward. These models add extensive layers and constraints to get impressive generation pictures. However, complexity and computation of existing cross modal GANs are too high to be deployed in mobile terminal. To solve this problem, this paper designs a compact cross modal GAN based on canonical polyadic decomposition. We replace an original convolution layer with three small convolution layers and use an autoencoder to stabilize and speed up training. The experimental results show that our model achieves 20% times of compression in both parameters and FLOPs without loss of quality on generated images.

Download Full-text

Utilizing Amari-Alpha Divergence to Stabilize the Training of Generative Adversarial Networks

Entropy ◽

10.3390/e22040410 ◽

2020 ◽

Vol 22 (4) ◽

pp. 410 ◽

Cited By ~ 2

Author(s):

Likun Cai ◽

Yanjie Chen ◽

Ning Cai ◽

Wei Cheng ◽

Hao Wang

Keyword(s):

State Of The Art ◽

Generative Adversarial Networks ◽

Image Generation ◽

Significant Progress ◽

Trade Off ◽

Adversarial Networks ◽

Leibler Divergence ◽

The Stability ◽

Hellinger Divergence

Generative Adversarial Nets (GANs) are one of the most popular architectures for image generation, which has achieved significant progress in generating high-resolution, diverse image samples. The normal GANs are supposed to minimize the Kullback–Leibler divergence between distributions of natural and generated images. In this paper, we propose the Alpha-divergence Generative Adversarial Net (Alpha-GAN) which adopts the alpha divergence as the minimization objective function of generators. The alpha divergence can be regarded as a generalization of the Kullback–Leibler divergence, Pearson χ 2 divergence, Hellinger divergence, etc. Our Alpha-GAN employs the power function as the form of adversarial loss for the discriminator with two-order indexes. These hyper-parameters make our model more flexible to trade off between the generated and target distributions. We further give a theoretical analysis of how to select these hyper-parameters to balance the training stability and the quality of generated images. Extensive experiments of Alpha-GAN are performed on SVHN and CelebA datasets, and evaluation results show the stability of Alpha-GAN. The generated samples are also competitive compared with the state-of-the-art approaches.

Download Full-text

Real Sample Consistency Regularization for GANs

Entropy ◽

10.3390/e23091231 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1231

Author(s):

Xiangde Zhang ◽

Jian Zhang

Keyword(s):

Fundamental Problem ◽

Average Distance ◽

Synthetic Data ◽

Real Sample ◽

Generative Adversarial Networks ◽

Training Process ◽

Real Samples ◽

Theoretical Loss ◽

Adversarial Networks

Mode collapse has always been a fundamental problem in generative adversarial networks. The recently proposed Zero Gradient Penalty (0GP) regularization can alleviate the mode collapse, but it will exacerbate a discriminator’s misjudgment problem, that is the discriminator judges that some generated samples are more real than real samples. In actual training, the discriminator will direct the generated samples to point to samples with higher discriminator outputs. The serious misjudgment problem of the discriminator will cause the generator to generate unnatural images and reduce the quality of the generation. This paper proposes Real Sample Consistency (RSC) regularization. In the training process, we randomly divided the samples into two parts and minimized the loss of the discriminator’s outputs corresponding to these two parts, forcing the discriminator to output the same value for all real samples. We analyzed the effectiveness of our method. The experimental results showed that our method can alleviate the discriminator’s misjudgment and perform better with a more stable training process than 0GP regularization. Our real sample consistency regularization improved the FID score for the conditional generation of Fake-As-Real GAN (FARGAN) from 14.28 to 9.8 on CIFAR-10. Our RSC regularization improved the FID score from 23.42 to 17.14 on CIFAR-100 and from 53.79 to 46.92 on ImageNet2012. Our RSC regularization improved the average distance between the generated and real samples from 0.028 to 0.025 on synthetic data. The loss of the generator and discriminator in standard GAN with our regularization was close to the theoretical loss and kept stable during the training process.

Download Full-text

SemGAN: Text to Image Synthesis from Text Semantics using Attentional Generative Adversarial Networks

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE) ◽

10.1109/iccceee49695.2021.9429602 ◽

2021 ◽

Author(s):

Ammar Nasr ◽

Ruba Mutasim ◽

Hiba Imam

Keyword(s):

Image Synthesis ◽

Generative Adversarial Networks ◽

Adversarial Networks

Download Full-text

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis

Neural Networks ◽

10.1016/j.neunet.2021.01.023 ◽

2021 ◽

Vol 138 ◽

pp. 57-67

Author(s):

Dunlu Peng ◽

Wuchen Yang ◽

Cong Liu ◽

Shuairui Lü

Keyword(s):

Image Synthesis ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Multi Stage

Download Full-text

Drawgan: Text to Image Synthesis with Drawing Generative Adversarial Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414166 ◽

2021 ◽

Author(s):

Zhiqiang Zhang ◽

Jinjia Zhou ◽

Wenxin Yu ◽

Ning Jiang

Keyword(s):

Image Synthesis ◽

Generative Adversarial Networks ◽

Adversarial Networks

Download Full-text

Dynamics of Fourier Modes in Torus Generative Adversarial Networks

Mathematics ◽

10.3390/math9040325 ◽

2021 ◽

Vol 9 (4) ◽

pp. 325

Author(s):

Ángel González-Prieto ◽

Alberto Mozo ◽

Edgar Talavera ◽

Sandra Gómez-Canaval

Keyword(s):

Fourier Series ◽

Generative Adversarial Networks ◽

Learning Models ◽

Training Process ◽

Small Perturbations ◽

Adversarial Networks ◽

Novel Method ◽

Truncated Fourier Series ◽

Real Flow ◽

Machine Learning Models

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.

Download Full-text

mustGAN: multi-stream Generative Adversarial Networks for MR Image Synthesis

Medical Image Analysis ◽

10.1016/j.media.2020.101944 ◽

2021 ◽

pp. 101944

Author(s):

Mahmut Yurt ◽

Salman U.H. Dar ◽

Aykut Erdem ◽

Erkut Erdem ◽

Kader K Oguz ◽

...

Keyword(s):

Image Synthesis ◽

Generative Adversarial Networks ◽

Mr Image ◽

Adversarial Networks

Download Full-text

Towards Accuracy Enhancement of Age Group Classification Using Generative Adversarial Networks

Journal of Integrated Design and Process Science ◽

10.3233/jid-210019 ◽

2021 ◽

pp. 1-17

Author(s):

Khaled ELKarazle ◽

Valliappan Raman ◽

Patrick Then

Keyword(s):

Age Estimation ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Original Dataset ◽

Age Group Classification ◽

Facial Images

Age estimation models can be employed in many applications, including soft biometrics, content access control, targeted advertising, and many more. However, as some facial images are taken in unrestrained conditions, the quality relegates, which results in the loss of several essential ageing features. This study investigates how introducing a new layer of data processing based on a super-resolution generative adversarial network (SRGAN) model can influence the accuracy of age estimation by enhancing the quality of both the training and testing samples. Additionally, we introduce a novel convolutional neural network (CNN) classifier to distinguish between several age classes. We train one of our classifiers on a reconstructed version of the original dataset and compare its performance with an identical classifier trained on the original version of the same dataset. Our findings reveal that the classifier which trains on the reconstructed dataset produces better classification accuracy, opening the door for more research into building data-centric machine learning systems.

Download Full-text

Fractional Wavelet-Based Generative Scattering Networks

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.752752 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jiasong Wu ◽

Xiang Qiu ◽

Jing Zhang ◽

Fuzhi Wu ◽

Youyong Kong ◽

...

Keyword(s):

Dimensionality Reduction ◽

Reduction Method ◽

Gaussian White Noise ◽

Principal Component ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Adversarial Networks ◽

Dimensionality Reduction Method

Generative adversarial networks and variational autoencoders (VAEs) provide impressive image generation from Gaussian white noise, but both are difficult to train, since they need a generator (or encoder) and a discriminator (or decoder) to be trained simultaneously, which can easily lead to unstable training. To solve or alleviate these synchronous training problems of generative adversarial networks (GANs) and VAEs, researchers recently proposed generative scattering networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate an image. The advantage of GSNs is that the parameters of ScatNets do not need to be learned, while the disadvantage of GSNs is that their ability to obtain representations of ScatNets is slightly weaker than that of CNNs. In addition, the dimensionality reduction method of principal component analysis (PCA) can easily lead to overfitting in the training of GSNs and, therefore, affect the quality of generated images in the testing process. To further improve the quality of generated images while keeping the advantages of GSNs, this study proposes generative fractional scattering networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets), instead of ScatNets as the encoder to obtain features (or FrScatNet embeddings) and use similar CNNs of GSNs as the decoder to generate an image. Additionally, this study develops a new dimensionality reduction method named feature-map fusion (FMF) instead of performing PCA to better retain the information of FrScatNets,; it also discusses the effect of image fusion on the quality of the generated image. The experimental results obtained on the CIFAR-10 and CelebA datasets show that the proposed GFRSNs can lead to better generated images than the original GSNs on testing datasets. The experimental results of the proposed GFRSNs with deep convolutional GAN (DCGAN), progressive GAN (PGAN), and CycleGAN are also given.

Download Full-text