Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models

AbstractIn the development of the voice conversion and personification of the text-to-speech (TTS) systems, it is very necessary to have feedback information about the users’ opinion on the resulting synthetic speech quality. Therefore, the main aim of the experiments described in this paper was to find out whether the classifier based on Gaussian mixture models (GMM) could be applied for evaluation of different storytelling voices created by transformation of the sentences generated by the Czech and Slovak TTS system. We suppose that it is possible to combine this GMM-based statistical evaluation with the classical one in the form of listening tests or it can replace them. The results obtained in this way were in good correlation with the results of the conventional listening test, so they confirm practical usability of the developed GMM classifier. With the help of the performed analysis, the optimal setting of the initial parameters and the structure of the input feature set for recognition of the storytelling voices was finally determined.

Download Full-text

Speech enhancement using Maximum A-Posteriori and Gaussian Mixture Models for speech and noise Periodogram estimation

Computer Speech & Language ◽

10.1016/j.csl.2015.09.001 ◽

2016 ◽

Vol 36 ◽

pp. 58-71 ◽

Cited By ~ 14

Author(s):

Sarang Chehrehsa ◽

Tom James Moir

Keyword(s):

Mixture Models ◽

Speech Enhancement ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Maximum A Posteriori ◽

A Posteriori

Download Full-text

Voice conversion using Gaussian Mixture Models

2015 International Conference on Communication, Information & Computing Technology (ICCICT) ◽

10.1109/iccict.2015.7045743 ◽

2015 ◽

Author(s):

Kevin D'souza ◽

K.T.V Talele

Keyword(s):

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Voice Conversion

Download Full-text

Voice conversion based on Gaussian processes by using kernels modeling the spectral density with Gaussian mixture models

Modern Physics Letters B ◽

10.1142/s0217984918400961 ◽

2018 ◽

Vol 32 (34n36) ◽

pp. 1840096

Author(s):

Jingyi Bao ◽

Ning Xu

Keyword(s):

Spectral Density ◽

Gaussian Process ◽

Mixture Models ◽

Gaussian Processes ◽

State Of The Art ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Voice Conversion ◽

First Time ◽

Fitting Problem

Voice conversion (VC) is a technique that aims to transform the individuality of a source speech so as to mimic that of a target speech while keeping the message unaltered. In our previous work, Gaussian process (GP) was introduced into the literature of VC for the first time, for the sake of overcoming the “over-fitting” problem inherent in the state-of-the-art VC methods, which gives very promising results. However, standard GP usually acts as somewhat a smoothing device more than a universal approximator. In this paper, we further attempt to improve the flexibility of GP-based VC by resorting to the expressive kernels that are derived to model the spectral density with Gaussian mixture model (GMM). Our new method benefits from the expressiveness of the new kernel while the inference of GP remains simple and analytic as usual. Experiments demonstrate both objectively and subjectively that the individualities of the converted speech are much more closer to those of the target while speech quality obtained is comparable to the standard GP-based method.

Download Full-text