Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes

Author(s):  
Ling-Hui Chen ◽  
Zhen-Hua Ling ◽  
Li-Rong Dai
2020 ◽  
Vol 17 (1) ◽  
pp. 316-321
Author(s):  
V. Naveena ◽  
Susmitha Vekkot ◽  
K. Jeeva Priya

The paper focuses on usage of deep neural networks for converting a person’s voice to another person’s voice, analogous to a mimic. The work in this paper introduces the concept of neural networks and deploys multi-layer deep neural networks for building a framework for voice conversion. The spectral Mel-Frequency Cepstral Coefficients (MFCCs) are converted using a 10-layer deep network while fundamental frequency (F0) conversion is accomplished by logarithmic Gaussian normalized transformation. MFCCs are subjected to inverse cepstral filtering while changes in F0 are incorporated using Pitch Synchronous OverLap Add (PSOLA) algorithm for re-synthesis. The results obtained are compared using Mel Cepstral Distortion (MCD) for objective evaluation while ABX-listening test is conducted for subjective assessment. Maximum improvement in MCD of 13.87% is obtained for female-to-male conversion while ABX-listening test indicates that female-to-male is closest to target with an agreement of 76.2%. The method achieves reasonably good performance compared to state-of-the-art using optimal resources and avoids requirement of highly complex computations.


Author(s):  
Michael Gian V. Gonzales ◽  
Crisron Rudolf G. Lucas ◽  
Michael Gringo Angelo R. Bayona ◽  
Franz A. De Leon

2014 ◽  
Vol 22 (12) ◽  
pp. 1859-1872 ◽  
Author(s):  
Ling-Hui Chen ◽  
Zhen-Hua Ling ◽  
Li-Juan Liu ◽  
Li-Rong Dai

Author(s):  
Alex Hernández-García ◽  
Johannes Mehrer ◽  
Nikolaus Kriegeskorte ◽  
Peter König ◽  
Tim C. Kietzmann

Sign in / Sign up

Export Citation Format

Share Document