Fine-Grained Semantic Image Synthesis with Object-Attention Generative Adversarial Network

Semantic image synthesis is a new rising and challenging vision problem accompanied by the recent promising advances in generative adversarial networks. The existing semantic image synthesis methods only consider the global information provided by the semantic segmentation mask, such as class label, global layout, and location, so the generative models cannot capture the rich local fine-grained information of the images (e.g., object structure, contour, and texture). To address this issue, we adopt a multi-scale feature fusion algorithm to refine the generated images by learning the fine-grained information of the local objects. We propose OA-GAN, a novel object-attention generative adversarial network that allows attention-driven, multi-fusion refinement for fine-grained semantic image synthesis. Specifically, the proposed model first generates multi-scale global image features and local object features, respectively, then the local object features are fused into the global image features to improve the correlation between the local and the global. In the process of feature fusion, the global image features and the local object features are fused through the channel-spatial-wise fusion block to learn ‘what’ and ‘where’ to attend in the channel and spatial axes, respectively. The fused features are used to construct correlation filters to obtain feature response maps to determine the locations, contours, and textures of the objects. Extensive quantitative and qualitative experiments on COCO-Stuff, ADE20K and Cityscapes datasets demonstrate that our OA-GAN significantly outperforms the state-of-the-art methods.

Download Full-text

Text to Realistic Image Generation with Attentional Concatenation Generative Adversarial Networks

Discrete Dynamics in Nature and Society ◽

10.1155/2020/6452536 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Linyan Li ◽

Yu Sun ◽

Fuyuan Hu ◽

Tao Zhou ◽

Xuefeng Xi ◽

...

Keyword(s):

High Resolution ◽

Image Synthesis ◽

Semantic Space ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Fine Grained ◽

Adversarial Network ◽

Adversarial Networks ◽

Cascade Structure ◽

High Resolution Images

In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.

Download Full-text

Multi-scale Hierarchy Feature Fusion Generative Adversarial Network for Low-Dose CT Denoising

2020 9th International Conference on Bioinformatics and Biomedical Science ◽

10.1145/3431943.3432286 ◽

2020 ◽

Author(s):

Ying Bai ◽

Haifeng Zhao ◽

Shaojie Zhang ◽

Dong Nie ◽

Zhenyu Tang

Keyword(s):

Low Dose ◽

Feature Fusion ◽

Generative Adversarial Network ◽

Low Dose Ct ◽

Multi Scale ◽

Adversarial Network ◽

Scale Hierarchy

Download Full-text

CDL-GAN: Contrastive Distance Learning Generative Adversarial Network for Image Generation

Applied Sciences ◽

10.3390/app11041380 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1380

Author(s):

Yingbo Zhou ◽

Pengcheng Zhao ◽

Weiqin Tong ◽

Yongxin Zhu

Keyword(s):

Distance Learning ◽

Feature Learning ◽

Image Synthesis ◽

Image Feature ◽

Generative Adversarial Networks ◽

Image Generation ◽

Generative Adversarial Network ◽

Feature Representations ◽

Adversarial Network ◽

Public Datasets

While Generative Adversarial Networks (GANs) have shown promising performance in image generation, they suffer from numerous issues such as mode collapse and training instability. To stabilize GAN training and improve image synthesis quality with diversity, we propose a simple yet effective approach as Contrastive Distance Learning GAN (CDL-GAN) in this paper. Specifically, we add Consistent Contrastive Distance (CoCD) and Characteristic Contrastive Distance (ChCD) into a principled framework to improve GAN performance. The CoCD explicitly maximizes the ratio of the distance between generated images and the increment between noise vectors to strengthen image feature learning for the generator. The ChCD measures the sampling distance of the encoded images in Euler space to boost feature representations for the discriminator. We model the framework by employing Siamese Network as a module into GANs without any modification on the backbone. Both qualitative and quantitative experiments conducted on three public datasets demonstrate the effectiveness of our method.

Download Full-text

GANai: Standardizing CT Images using Generative Adversarial Network with Alternative Improvement

10.1101/460188 ◽

2018 ◽

Author(s):

Gongbo Liang ◽

Sajjad Fouladvand ◽

Jie Zhang ◽

Michael A. Brooks ◽

Nathan Jacobs ◽

...

Keyword(s):

Large Scale ◽

Model Performance ◽

Image Synthesis ◽

Ct Images ◽

Image Features ◽

Training Data ◽

Ct Image ◽

Generative Adversarial Network ◽

Training Strategy ◽

Adversarial Network

AbstractComputed tomography (CT) is a widely-used diag-reproducibility regarding radiomic features, such as intensity, nostic image modality routinely used for assessing anatomical tissue characteristics. However, non-standardized imaging pro-tocols are commonplace, which poses a fundamental challenge in large-scale cross-center CT image analysis. One approach to address the problem is to standardize CT images using generative adversarial network models (GAN). GAN learns the data distribution of training images and generate synthesized images under the same distribution. However, existing GAN models are not directly applicable to this task mainly due to the lack of constraints on the mode of data to generate. Furthermore, they treat every image equally, but in real applications, some images are more difficult to standardize than the others. All these may lead to the lack-of-detail problem in CT image synthesis. We present a new GAN model called GANai to mitigate the differences in radiomic features across CT images captured using non-standard imaging protocols. Given source images, GANai composes new images by specifying a high-level goal that the image features of the synthesized images should be similar to those of the standard images. GANai introduces an alternative improvement training strategy to alternatively and steadily improve model performance. The new training strategy enables a series of technical improvements, including phase-specific loss functions, phase-specific training data, and the adoption of ensemble learning, leading to better model performance. The experimental results show that GANai is significantly better than the existing state-of-the-art image synthesis algorithms on CT image standardization. Also, it significantly improves the efficiency and stability of GAN model training.

Download Full-text

BEGAN v3: Avoiding Mode Collapse in GANs Using Variational Inference

Electronics ◽

10.3390/electronics9040688 ◽

2020 ◽

Vol 9 (4) ◽

pp. 688

Author(s):

Sung-Wook Park ◽

Jun-Ho Huh ◽

Jong-Chan Kim

Keyword(s):

Image Synthesis ◽

Generative Model ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Compression Performance ◽

Adversarial Network ◽

Adversarial Networks ◽

Latent Space ◽

Boundary Equilibrium ◽

Space Compression

In the field of deep learning, the generative model did not attract much attention until GANs (generative adversarial networks) appeared. In 2014, Google’s Ian Goodfellow proposed a generative model called GANs. GANs use different structures and objective functions from the existing generative model. For example, GANs use two neural networks: a generator that creates a realistic image, and a discriminator that distinguishes whether the input is real or synthetic. If there are no problems in the training process, GANs can generate images that are difficult even for experts to distinguish in terms of authenticity. Currently, GANs are the most researched subject in the field of computer vision, which deals with the technology of image style translation, synthesis, and generation, and various models have been unveiled. The issues raised are also improving one by one. In image synthesis, BEGAN (Boundary Equilibrium Generative Adversarial Network), which outperforms the previously announced GANs, learns the latent space of the image, while balancing the generator and discriminator. Nonetheless, BEGAN also has a mode collapse wherein the generator generates only a few images or a single one. Although BEGAN-CS (Boundary Equilibrium Generative Adversarial Network with Constrained Space), which was improved in terms of loss function, was introduced, it did not solve the mode collapse. The discriminator structure of BEGAN-CS is AE (AutoEncoder), which cannot create a particularly useful or structured latent space. Compression performance is not good either. In this paper, this characteristic of AE is considered to be related to the occurrence of mode collapse. Thus, we used VAE (Variational AutoEncoder), which added statistical techniques to AE. As a result of the experiment, the proposed model did not cause mode collapse but converged to a better state than BEGAN-CS.

Download Full-text

MapGAN: An Intelligent Generation Model for Network Tile Maps

Sensors ◽

10.3390/s20113119 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3119 ◽

Cited By ~ 1

Author(s):

Jingtao Li ◽

Zhanlong Chen ◽

Xiaozhen Zhao ◽

Lijia Shao

Keyword(s):

Image Inpainting ◽

Super Resolution ◽

Image Synthesis ◽

Generative Adversarial Networks ◽

Great Success ◽

Generation Model ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Image Translation ◽

Map Generation

In recent years, the generative adversarial network (GAN)-based image translation model has achieved great success in image synthesis, image inpainting, image super-resolution, and other tasks. However, the images generated by these models often have problems such as insufficient details and low quality. Especially for the task of map generation, the generated electronic map cannot achieve effects comparable to industrial production in terms of accuracy and aesthetics. This paper proposes a model called Map Generative Adversarial Networks (MapGAN) for generating multitype electronic maps accurately and quickly based on both remote sensing images and render matrices. MapGAN improves the generator architecture of Pix2pixHD and adds a classifier to enhance the model, enabling it to learn the characteristics and style differences of different types of maps. Using the datasets of Google Maps, Baidu maps, and Map World maps, we compare MapGAN with some recent image translation models in the fields of one-to-one map generation and one-to-many domain map generation. The results show that the quality of the electronic maps generated by MapGAN is optimal in terms of both intuitive vision and classic evaluation indicators.

Download Full-text

SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6773 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11157-11164

Author(s):

Sheng Jin ◽

Shangchen Zhou ◽

Yao Liu ◽

Chao Chen ◽

Xiaoshuai Sun ◽

...

Keyword(s):

Large Scale ◽

Semantic Information ◽

Unified Framework ◽

Generative Adversarial Network ◽

Fine Grained ◽

Multi Scale ◽

Deep Hashing ◽

Adversarial Network ◽

Improve State

Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets.

Download Full-text

Unsupervised Generation and Synthesis of Facial Images via an Auto-Encoder-Based Deep Generative Adversarial Network

Applied Sciences ◽

10.3390/app10061995 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1995 ◽

Cited By ~ 1

Author(s):

Jeong gi Kwak ◽

Hanseok Ko

Keyword(s):

Image Synthesis ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Training Scheme ◽

Adversarial Networks ◽

High Level ◽

Facial Images ◽

And Training ◽

Double Constraint

The processing of facial images is an important task, because it is required for a large number of real-world applications. As deep-learning models evolve, they require a huge number of images for training. In reality, however, the number of images available is limited. Generative adversarial networks (GANs) have thus been utilized for database augmentation, but they suffer from unstable training, low visual quality, and a lack of diversity. In this paper, we propose an auto-encoder-based GAN with an enhanced network structure and training scheme for Database (DB) augmentation and image synthesis. Our generator and decoder are divided into two separate modules that each take input vectors for low-level and high-level features; these input vectors affect all layers within the generator and decoder. The effectiveness of the proposed method is demonstrated by comparing it with baseline methods. In addition, we introduce a new scheme that can combine two existing images without the need for extra networks based on the auto-encoder structure of the discriminator in our model. We add a novel double-constraint loss to make the encoded latent vectors equal to the input vectors.

Download Full-text

Pal-GAN: Palette-conditioned Generative Adversarial Networks

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v6i1.3536 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1-5

Author(s):

Adam Balint ◽

Graham Taylor

Keyword(s):

Domain Knowledge ◽

Image Synthesis ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

High Quality Image ◽

Common Technique ◽

Class Labels ◽

Low Dimensional

Recent advances in Generative Adversarial Networks (GANs) have shown great progress on a large variety of tasks. A common technique used to yield greater diversity of samples is conditioning on class labels. Conditioning on high-dimensional structured or unstructured information has also been shown to improve generation results, e.g. Image-to-Image translation. The conditioning information is provided in the form of human annotations, which can be expensive and difficult to obtain in cases where domain knowledge experts are needed. In this paper, we present an alternative: conditioning on low-dimensional structured information that can be automatically extracted from the input without the need for human annotators. Specifically, we propose a Palette-conditioned Generative Adversarial Network (Pal-GAN), an architecture-agnostic model that conditions on both a colour palette and a segmentation mask for high quality image synthesis. We show improvements on conditional consistency, intersection-over-union, and Fréchet inception distance scores. Additionally, we show that sampling colour palettes significantly changes the style of the generated images.

Download Full-text

PEMBUATAN GAMBAR SINTESIS DARI DEKSRIPSI TEKS DENGAN ALGORITMA GENERATIVE ADVERSARIAL NETWORK

Aisyah Journal Of Informatics and Electrical Engineering (A.J.I.E.E) ◽

10.30604/jti.v2i2.31 ◽

2020 ◽

Vol 2 (2) ◽

pp. 111-114

Author(s):

R Wisnu Prio Pamungkas ◽

Rakhmi Khalida ◽

Siti Setiawati

Keyword(s):

Machine Learning ◽

Image Synthesis ◽

Generative Adversarial Networks ◽

Human Intelligence ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks

ABSTRACT Recently computers have been able to produce realistic photos from text. This is one of the potentials of machine learning to be used creatively. Machine learning is the field of solving problems that require an equivalent understanding of human intelligence. In this study using the Generative Adversarial Networks (GAN) algorithm is used to create images from text descriptions. The basic GAN architecture consists of 2 networks called a Generator and Discriminator network. The results of this study is images that are still not detailed in interpreting a text description, but the authors try to produce images that inspire, images can be more poetic when tried using poetry, lyrics, or book quotes. Keywords: GAN, Image Synthesis, Text Description ABSTRAK Baru-baru ini komputer mampu menghasilkan foto-foto yang realistis dari sebuah teks. Hal ini adalah salah satu potensi dari machine learning untuk digunakan secara kreatif. Machine learning adalah bidang menyelesaikan masalah-masalah yang membutuhkan pemahaman yang setara dengan kecerdasan manusia. Pada penelitian ini menggunakan algoritme Generative Adversarial Networks (GAN) digunakan untuk menciptakan gambar dari deskripsi teks. Dasar arsitektur GAN terdiri dari 2 jaringan yang disebut sebagai jaringan Generator dan Discriminator. Hasil dari penelitian ini berupa gambar yang masih tidak detail dalam memaknai sebuah deskripsi teks, tetapi penulis mencoba menghasilkan gambar yang menginspirasi, gambar dapat lebih puitis ketika dicoba menggunakan puisi, lirik, atau kutipan buku. Kata Kunci: GAN, Sintesis Gambar, Deskripsi Teks

Download Full-text