Novel Adaptive Generative Adversarial Network for Voice Conversion

Author(s):  
Maitreya Patel ◽  
Mihir Parmar ◽  
Savan Doshi ◽  
Nirmesh J. Shah ◽  
Hemant A. Patil
2021 ◽  
Vol 11 (16) ◽  
pp. 7489
Author(s):  
Mohammed Salah Al-Radhi ◽  
Tamás Gábor Csapó ◽  
Géza Németh

Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.


Author(s):  
Zhaojie Luo ◽  
Jinhui Chen ◽  
Tetsuya Takiguchi ◽  
Yasuo Ariki

AbstractIn this paper, we propose a novel neutral-to-emotional voice conversion (VC) model that can effectively learn a mapping from neutral to emotional speech with limited emotional voice data. Although conventional VC techniques have achieved tremendous success in spectral conversion, the lack of representations in fundamental frequency (F0), which explicitly represents prosody information, is still a major limiting factor for emotional VC. To overcome this limitation, in our proposed model, we outline the practical elements of the cross-wavelet transform (XWT) method, highlighting how such a method is applied in synthesizing diverse representations of F0 features in emotional VC. The idea is (1) to decompose F0 into different temporal level representations using continuous wavelet transform (CWT); (2) to use XWT to combine different CWT-F0 features to synthesize interaction XWT-F0 features; (3) and then use both the CWT-F0 and corresponding XWT-F0 features to train the emotional VC model. Moreover, to better measure similarities between the converted and real F0 features, we applied a VA-GAN training model, which combines a variational autoencoder (VAE) with a generative adversarial network (GAN). In the VA-GAN model, VAE learns the latent representations of high-dimensional features (CWT-F0, XWT-F0), while the discriminator of the GAN can use the learned feature representations as a basis for a VAE reconstruction objective.


2017 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


Author(s):  
Annapoorani Gopal ◽  
Lathaselvi Gandhimaruthian ◽  
Javid Ali

The Deep Neural Networks have gained prominence in the biomedical domain, becoming the most commonly used networks after machine learning technology. Mammograms can be used to detect breast cancers with high precision with the help of Convolutional Neural Network (CNN) which is deep learning technology. An exhaustive labeled data is required to train the CNN from scratch. This can be overcome by deploying Generative Adversarial Network (GAN) which comparatively needs lesser training data during a mammogram screening. In the proposed study, the application of GANs in estimating breast density, high-resolution mammogram synthesis for clustered microcalcification analysis, effective segmentation of breast tumor, analysis of the shape of breast tumor, extraction of features and augmentation of the image during mammogram classification have been extensively reviewed.


2019 ◽  
Vol 52 (21) ◽  
pp. 291-296 ◽  
Author(s):  
Minsung Sung ◽  
Jason Kim ◽  
Juhwan Kim ◽  
Son-Cheol Yu

Sign in / Sign up

Export Citation Format

Share Document