scholarly journals Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models

2010 ◽  
Vol E93-D (9) ◽  
pp. 2472-2482 ◽  
Author(s):  
Hironori DOI ◽  
Keigo NAKAMURA ◽  
Tomoki TODA ◽  
Hiroshi SARUWATARI ◽  
Kiyohiro SHIKANO
2015 ◽  
Vol 30 (1) ◽  
pp. 3-15 ◽  
Author(s):  
Daniel Erro ◽  
Agustin Alonso ◽  
Luis Serrano ◽  
Eva Navas ◽  
Inma Hernaez

2015 ◽  
Vol 66 (4) ◽  
pp. 194-202
Author(s):  
Jiří Přibil ◽  
Anna Přibilová ◽  
Daniela Ďuračková

AbstractIn the development of the voice conversion and personification of the text-to-speech (TTS) systems, it is very necessary to have feedback information about the users’ opinion on the resulting synthetic speech quality. Therefore, the main aim of the experiments described in this paper was to find out whether the classifier based on Gaussian mixture models (GMM) could be applied for evaluation of different storytelling voices created by transformation of the sentences generated by the Czech and Slovak TTS system. We suppose that it is possible to combine this GMM-based statistical evaluation with the classical one in the form of listening tests or it can replace them. The results obtained in this way were in good correlation with the results of the conventional listening test, so they confirm practical usability of the developed GMM classifier. With the help of the performed analysis, the optimal setting of the initial parameters and the structure of the input feature set for recognition of the storytelling voices was finally determined.


2018 ◽  
Vol 32 (34n36) ◽  
pp. 1840096
Author(s):  
Jingyi Bao ◽  
Ning Xu

Voice conversion (VC) is a technique that aims to transform the individuality of a source speech so as to mimic that of a target speech while keeping the message unaltered. In our previous work, Gaussian process (GP) was introduced into the literature of VC for the first time, for the sake of overcoming the “over-fitting” problem inherent in the state-of-the-art VC methods, which gives very promising results. However, standard GP usually acts as somewhat a smoothing device more than a universal approximator. In this paper, we further attempt to improve the flexibility of GP-based VC by resorting to the expressive kernels that are derived to model the spectral density with Gaussian mixture model (GMM). Our new method benefits from the expressiveness of the new kernel while the inference of GP remains simple and analytic as usual. Experiments demonstrate both objectively and subjectively that the individualities of the converted speech are much more closer to those of the target while speech quality obtained is comparable to the standard GP-based method.


2017 ◽  
Vol 34 (10) ◽  
pp. 1399-1414 ◽  
Author(s):  
Wanxia Deng ◽  
Huanxin Zou ◽  
Fang Guo ◽  
Lin Lei ◽  
Shilin Zhou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document