Voice conversion based on Gaussian processes by using kernels modeling the spectral density with Gaussian mixture models

2018 ◽  
Vol 32 (34n36) ◽  
pp. 1840096
Author(s):  
Jingyi Bao ◽  
Ning Xu

Voice conversion (VC) is a technique that aims to transform the individuality of a source speech so as to mimic that of a target speech while keeping the message unaltered. In our previous work, Gaussian process (GP) was introduced into the literature of VC for the first time, for the sake of overcoming the “over-fitting” problem inherent in the state-of-the-art VC methods, which gives very promising results. However, standard GP usually acts as somewhat a smoothing device more than a universal approximator. In this paper, we further attempt to improve the flexibility of GP-based VC by resorting to the expressive kernels that are derived to model the spectral density with Gaussian mixture model (GMM). Our new method benefits from the expressiveness of the new kernel while the inference of GP remains simple and analytic as usual. Experiments demonstrate both objectively and subjectively that the individualities of the converted speech are much more closer to those of the target while speech quality obtained is comparable to the standard GP-based method.

2010 ◽  
Vol E93-D (9) ◽  
pp. 2472-2482 ◽  
Author(s):  
Hironori DOI ◽  
Keigo NAKAMURA ◽  
Tomoki TODA ◽  
Hiroshi SARUWATARI ◽  
Kiyohiro SHIKANO

2015 ◽  
Vol 30 (1) ◽  
pp. 3-15 ◽  
Author(s):  
Daniel Erro ◽  
Agustin Alonso ◽  
Luis Serrano ◽  
Eva Navas ◽  
Inma Hernaez

2015 ◽  
Vol 66 (4) ◽  
pp. 194-202
Author(s):  
Jiří Přibil ◽  
Anna Přibilová ◽  
Daniela Ďuračková

AbstractIn the development of the voice conversion and personification of the text-to-speech (TTS) systems, it is very necessary to have feedback information about the users’ opinion on the resulting synthetic speech quality. Therefore, the main aim of the experiments described in this paper was to find out whether the classifier based on Gaussian mixture models (GMM) could be applied for evaluation of different storytelling voices created by transformation of the sentences generated by the Czech and Slovak TTS system. We suppose that it is possible to combine this GMM-based statistical evaluation with the classical one in the form of listening tests or it can replace them. The results obtained in this way were in good correlation with the results of the conventional listening test, so they confirm practical usability of the developed GMM classifier. With the help of the performed analysis, the optimal setting of the initial parameters and the structure of the input feature set for recognition of the storytelling voices was finally determined.


2003 ◽  
Vol 15 (2) ◽  
pp. 469-485 ◽  
Author(s):  
J. J. Verbeek ◽  
N. Vlassis ◽  
B. Kröse

This article concerns the greedy learning of gaussian mixtures. In the greedy approach, mixture components are inserted into the mixture one aftertheother.We propose a heuristic for searching for the optimal component to insert. In a randomized manner, a set of candidate new components is generated. For each of these candidates, we find the locally optimal new component and insert it into the existing mixture. The resulting algorithm resolves the sensitivity to initialization of state-of-the-art methods, like expectation maximization, and has running time linear in the number of data points and quadratic in the (final) number of mixture components. Due to its greedy nature, the algorithm can be particularly useful when the optimal number of mixture components is unknown. Experimental results comparing the proposed algorithm to other methods on density estimation and texture segmentation are provided.


Author(s):  
Ana-Maria Simionovici ◽  
Alexandru Adrian Tantar ◽  
Pascal Bouvry ◽  
Andrei Tchernykh ◽  
Jorge M. Cortes-Mendoza ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document