Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network

Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.

Download Full-text

Improvement of Learning Stability of Generative Adversarial Network Using Variational Learning

Applied Sciences ◽

10.3390/app10134528 ◽

2020 ◽

Vol 10 (13) ◽

pp. 4528

Author(s):

Je-Yeol Lee ◽

Sang-Il Choi

Keyword(s):

Generative Models ◽

Learning Processes ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Secondary Network ◽

Variational Autoencoder ◽

Variational Learning

In this paper, we propose a new network model using variational learning to improve the learning stability of generative adversarial networks (GAN). The proposed method can be easily applied to improve the learning stability of GAN-based models that were developed for various purposes, given that the variational autoencoder (VAE) is used as a secondary network while the basic GAN structure is maintained. When the gradient of the generator vanishes in the learning process of GAN, the proposed method receives gradient information from the decoder of the VAE that maintains gradient stably, so that the learning processes of the generator and discriminator are not halted. The experimental results of the MNIST and the CelebA datasets verify that the proposed method improves the learning stability of the networks by overcoming the vanishing gradient problem of the generator, and maintains the excellent data quality of the conventional GAN-based generative models.

Download Full-text

Self-Attention-Based Conditional Variational Auto-Encoder Generative Adversarial Networks for Hyperspectral Classification

Remote Sensing ◽

10.3390/rs13163316 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3316

Author(s):

Zhitao Chen ◽

Lei Tong ◽

Bin Qian ◽

Jing Yu ◽

Chuangbai Xiao

Keyword(s):

Classification Performance ◽

Training Data ◽

Generative Adversarial Networks ◽

Classification Methods ◽

Generative Adversarial Network ◽

Model Stability ◽

Adversarial Network ◽

Variational Autoencoder ◽

Virtual Samples ◽

Hyperspectral Classification

Hyperspectral classification is an important technique for remote sensing image analysis. For the current classification methods, limited training data affect the classification results. Recently, Conditional Variational Autoencoder Generative Adversarial Network (CVAEGAN) has been used to generate virtual samples to augment the training data, which could improve the classification performance. To further improve the classification performance, based on the CVAEGAN, we propose a Self-Attention-Based Conditional Variational Autoencoder Generative Adversarial Network (SACVAEGAN). Compared with CVAEGAN, we first use random latent vectors to obtain more enhanced virtual samples, which can improve the generalization performance. Then, we introduce the self-attention mechanism into our model to force the training process to pay more attention to global information, which can achieve better classification accuracy. Moreover, we explore model stability by incorporating the WGAN-GP loss function into our model to reduce the mode collapse probability. Experiments on three data sets and a comparison of the state-of-art methods show that SACVAEGAN has great advantages in accuracy compared with state-of-the-art HSI classification methods.

Download Full-text

Mask-aware photorealistic facial attribute manipulation

Computational Visual Media ◽

10.1007/s41095-021-0219-7 ◽

2021 ◽

Author(s):

Ruoqi Sun ◽

Chen Huang ◽

Hengliang Zhu ◽

Lizhuang Ma

Keyword(s):

Face Recognition ◽

Experimental Results ◽

Global Information ◽

Image Generation ◽

Feature Maps ◽

High Quality ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Variational Autoencoder

AbstractThe technique of facial attribute manipulation has found increasing application, but it remains challenging to restrict editing of attributes so that a face’s unique details are preserved. In this paper, we introduce our method, which we call a mask-adversarial autoencoder (M-AAE). It combines a variational autoencoder (VAE) and a generative adversarial network (GAN) for photorealistic image generation. We use partial dilated layers to modify a few pixels in the feature maps of an encoder, changing the attribute strength continuously without hindering global information. Our training objectives for the VAE and GAN are reinforced by supervision of face recognition loss and cycle consistency loss, to faithfully preserve facial details. Moreover, we generate facial masks to enforce background consistency, which allows our training to focus on the foreground face rather than the background. Experimental results demonstrate that our method can generate high-quality images with varying attributes, and outperforms existing methods in detail preservation.

Download Full-text

3D car shape reconstruction from a contour sketch using GAN and lazy learning

The Visual Computer ◽

10.1007/s00371-020-02024-y ◽

2021 ◽

Author(s):

Naoki Nozawa ◽

Hubert P. H. Shum ◽

Qi Feng ◽

Edmond S. L. Ho ◽

Shigeo Morishima

Keyword(s):

Machine Learning ◽

Computer Games ◽

Local Features ◽

Shape Reconstruction ◽

Lazy Learning ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Variational Autoencoder ◽

Collection Cost ◽

Minimal Data

Abstract3D car models are heavily used in computer games, visual effects, and even automotive designs. As a result, producing such models with minimal labour costs is increasingly more important. To tackle the challenge, we propose a novel system to reconstruct a 3D car using a single sketch image. The system learns from a synthetic database of 3D car models and their corresponding 2D contour sketches and segmentation masks, allowing effective training with minimal data collection cost. The core of the system is a machine learning pipeline that combines the use of a generative adversarial network (GAN) and lazy learning. GAN, being a deep learning method, is capable of modelling complicated data distributions, enabling the effective modelling of a large variety of cars. Its major weakness is that as a global method, modelling the fine details in the local region is challenging. Lazy learning works well to preserve local features by generating a local subspace with relevant data samples. We demonstrate that the combined use of GAN and lazy learning produces is able to produce high-quality results, in which different types of cars with complicated local features can be generated effectively with a single sketch. Our method outperforms existing ones using other machine learning structures such as the variational autoencoder.

Download Full-text

Road images augmentation with synthetic traffic signs using neural networks

Computer Optics ◽

10.18287/2412-6179-co-859 ◽

2021 ◽

Vol 5 (45) ◽

pp. 736-748

Author(s):

A.S. Konushin ◽

B.V. Faizov ◽

V.I. Shakhuro

Keyword(s):

Synthetic Data ◽

Real Data ◽

Training Data ◽

Traffic Sign ◽

Generative Adversarial Network ◽

Traffic Signs ◽

Adversarial Network ◽

Sign Recognition ◽

Variational Autoencoder ◽

Sign Detection

Traffic sign recognition is a well-researched problem in computer vision. However, the state of the art methods works only for frequent sign classes, which are well represented in training datasets. We consider the task of rare traffic sign detection and classification. We aim to solve that problem by using synthetic training data. Such training data is obtained by embedding synthetic images of signs in the real photos. We propose three methods for making synthetic signs consistent with a scene in appearance. These methods are based on modern generative adversarial network (GAN) architectures. Our proposed methods allow realistic embedding of rare traffic sign classes that are absent in the training set. We adapt a variational autoencoder for sampling plausible locations of new traffic signs in images. We demonstrate that using a mixture of our synthetic data with real data improves the accuracy of both classifier and detector.

Download Full-text