ClawGAN: Claw connection-based Generative Adversarial Networks for Facial Image Translation in Thermal to RGB Visible Light

Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domaininvariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timbreenhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.

Download Full-text

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

2019 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2019.8851881 ◽

2019 ◽

Cited By ~ 6

Author(s):

Hao Tang ◽

Dan Xu ◽

Nicu Sebe ◽

Yan Yan

Keyword(s):

Generative Adversarial Networks ◽

Adversarial Networks ◽

Image Translation

Download Full-text

Multi-Attribute Transfer via Disentangled Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019195 ◽

2019 ◽

Vol 33 ◽

pp. 9195-9202 ◽

Cited By ~ 4

Author(s):

Jianfu Zhang ◽

Yuanyuan Huang ◽

Yaoyi Li ◽

Weijie Zhao ◽

Liqing Zhang

Keyword(s):

Neural Network ◽

Facial Expression ◽

Generative Adversarial Networks ◽

Significant Progress ◽

Target Domain ◽

Adversarial Networks ◽

Proposed Model ◽

Image Translation ◽

Realistic Images ◽

Novel Model

Recent studies show significant progress in image-to-image translation task, especially facilitated by Generative Adversarial Networks. They can synthesize highly realistic images and alter the attribute labels for the images. However, these works employ attribute vectors to specify the target domain which diminishes image-level attribute diversity. In this paper, we propose a novel model formulating disentangled representations by projecting images to latent units, grouped feature channels of Convolutional Neural Network, to disassemble the information between different attributes. Thanks to disentangled representation, we can transfer attributes according to the attribute labels and moreover retain the diversity beyond the labels, namely, the styles inside each image. This is achieved by specifying some attributes and swapping the corresponding latent units to “swap” the attributes appearance, or applying channel-wise interpolation to blend different attributes. To verify the motivation of our proposed model, we train and evaluate our model on face dataset CelebA. Furthermore, the evaluation of another facial expression dataset RaFD demonstrates the generalizability of our proposed model.

Download Full-text

Deep Generative Adversarial Networks for Image-to-Image Translation: A Review

Symmetry ◽

10.3390/sym12101705 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1705

Author(s):

Aziz Alotaibi

Keyword(s):

Three Dimensional ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Future Research ◽

Comprehensive Overview ◽

Adversarial Networks ◽

Current State ◽

Image Translation ◽

Translation Techniques ◽

Future Research Directions

Many image processing, computer graphics, and computer vision problems can be treated as image-to-image translation tasks. Such translation entails learning to map one visual representation of a given input to another representation. Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation, object transfiguration-related translation, etc. However, image-to-image translation techniques suffer from some problems, such as mode collapse, instability, and a lack of diversity. This article provides a comprehensive overview of image-to-image translation based on GAN algorithms and its variants. It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations. Finally, open issues and future research directions utilizing reinforcement learning and three-dimensional (3D) modal translation are summarized and discussed.

Download Full-text

Joint image-to-image translation with denoising using enhanced generative adversarial networks

Signal Processing Image Communication ◽

10.1016/j.image.2020.116072 ◽

2021 ◽

Vol 91 ◽

pp. 116072

Author(s):

Lan Yan ◽

Wenbo Zheng ◽

Fei-Yue Wang ◽

Chao Gou

Keyword(s):

Generative Adversarial Networks ◽

Adversarial Networks ◽

Image Translation

Download Full-text