Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network

2020 ◽  
Vol 135 ◽  
pp. 22-29 ◽  
Author(s):  
Kenan E. Ak ◽  
Joo Hwee Lim ◽  
Jo Yew Tham ◽  
Ashraf A. Kassim
2021 ◽  
Vol 11 (4) ◽  
pp. 1380
Author(s):  
Yingbo Zhou ◽  
Pengcheng Zhao ◽  
Weiqin Tong ◽  
Yongxin Zhu

While Generative Adversarial Networks (GANs) have shown promising performance in image generation, they suffer from numerous issues such as mode collapse and training instability. To stabilize GAN training and improve image synthesis quality with diversity, we propose a simple yet effective approach as Contrastive Distance Learning GAN (CDL-GAN) in this paper. Specifically, we add Consistent Contrastive Distance (CoCD) and Characteristic Contrastive Distance (ChCD) into a principled framework to improve GAN performance. The CoCD explicitly maximizes the ratio of the distance between generated images and the increment between noise vectors to strengthen image feature learning for the generator. The ChCD measures the sampling distance of the encoded images in Euler space to boost feature representations for the discriminator. We model the framework by employing Siamese Network as a module into GANs without any modification on the backbone. Both qualitative and quantitative experiments conducted on three public datasets demonstrate the effectiveness of our method.


2020 ◽  
Vol 34 (05) ◽  
pp. 8830-8837
Author(s):  
Xin Sheng ◽  
Linli Xu ◽  
Junliang Guo ◽  
Jingchang Liu ◽  
Ruoyu Zhao ◽  
...  

We propose a novel introspective model for variational neural machine translation (IntroVNMT) in this paper, inspired by the recent successful application of introspective variational autoencoder (IntroVAE) in high quality image synthesis. Different from the vanilla variational NMT model, IntroVNMT is capable of improving itself introspectively by evaluating the quality of the generated target sentences according to the high-level latent variables of the real and generated target sentences. As a consequence of introspective training, the proposed model is able to discriminate between the generated and real sentences of the target language via the latent variables generated by the encoder of the model. In this way, IntroVNMT is able to generate more realistic target sentences in practice. In the meantime, IntroVNMT inherits the advantages of the variational autoencoders (VAEs), and the model training process is more stable than the generative adversarial network (GAN) based models. Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Linyan Li ◽  
Yu Sun ◽  
Fuyuan Hu ◽  
Tao Zhou ◽  
Xuefeng Xi ◽  
...  

In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Huan Yang ◽  
Pengjiang Qian ◽  
Chao Fan

Multimodal registration is a challenging task due to the significant variations exhibited from images of different modalities. CT and MRI are two of the most commonly used medical images in clinical diagnosis, since MRI with multicontrast images, together with CT, can provide complementary auxiliary information. The deformable image registration between MRI and CT is essential to analyze the relationships among different modality images. Here, we proposed an indirect multimodal image registration method, i.e., sCT-guided multimodal image registration and problematic image completion method. In addition, we also designed a deep learning-based generative network, Conditional Auto-Encoder Generative Adversarial Network, called CAE-GAN, combining the idea of VAE and GAN under a conditional process to tackle the problem of synthetic CT (sCT) synthesis. Our main contributions in this work can be summarized into three aspects: (1) We designed a new generative network called CAE-GAN, which incorporates the advantages of two popular image synthesis methods, i.e., VAE and GAN, and produced high-quality synthetic images with limited training data. (2) We utilized the sCT generated from multicontrast MRI as an intermediary to transform multimodal MRI-CT registration into monomodal sCT-CT registration, which greatly reduces the registration difficulty. (3) Using normal CT as guidance and reference, we repaired the abnormal MRI while registering the MRI to the normal CT.


2021 ◽  
Vol 30 ◽  
pp. 1275-1290
Author(s):  
Hongchen Tan ◽  
Xiuping Liu ◽  
Meng Liu ◽  
Baocai Yin ◽  
Xin Li

2021 ◽  
Author(s):  
Ziyu Li ◽  
Qiyuan Tian ◽  
Chanon Ngamsombat ◽  
Samuel Cartmell ◽  
John Conklin ◽  
...  

Purpose: To improve the signal-to-noise ratio (SNR) of highly accelerated volumetric MRI while preserve realistic textures using a generative adversarial network (GAN). Methods: A hybrid GAN for denoising entitled "HDnGAN" with a 3D generator and a 2D discriminator was proposed to denoise 3D T2-weighted fluid-attenuated inversion recovery (FLAIR) images acquired in 2.75 minutes (R=3×2) using wave-controlled aliasing in parallel imaging (Wave-CAIPI). HDnGAN was trained on data from 25 multiple sclerosis patients by minimizing a combined mean squared error and adversarial loss with adjustable weight λ. Results were evaluated on eight separate patients by comparing to standard T2-SPACE FLAIR images acquired in 7.25 minutes (R=2×2) using mean absolute error (MAE), peak SNR (PSNR), structural similarity index (SSIM), and VGG perceptual loss, and by two neuroradiologists using a five-point score regarding gray-white matter contrast, sharpness, SNR, lesion conspicuity, and overall quality. Results: HDnGAN (λ=0) produced the lowest MAE, highest PSNR and SSIM. HDnGAN (λ=10-3) produced the lowest VGG loss. In the reader study, HDnGAN (λ=10-3) significantly improved the gray-white contrast and SNR of Wave-CAIPI images, and outperformed BM4D and HDnGAN (λ=0) regarding image sharpness. The overall quality score from HDnGAN (λ=10-3) was significantly higher than those from Wave-CAIPI, BM4D, and HDnGAN (λ=0), with no significant difference compared to standard images. Conclusion: HDnGAN concurrently benefits from improved image synthesis performance of 3D convolution and increased training samples for training the 2D discriminator on limited data. HDnGAN generates images with high SNR and realistic textures, similar to those acquired in longer times and preferred by neuroradiologists.


Sign in / Sign up

Export Citation Format

Share Document