Voice conversion based on Gaussian mixture modules with Minimum Distance Spectral Mapping

Author(s):  
Gui Jin ◽  
Michael T. Johnson ◽  
Jia Liu ◽  
Xiaokang Lin
2013 ◽  
Vol 38 (1) ◽  
pp. 39-45
Author(s):  
Peng Song ◽  
Li Zhao ◽  
Yongqiang Bao

Abstract The Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal components are adjusted properly to prevent the overfitting. Then, in order to better model the nonlinear relationships between the source speech and target speech, the kernel principal component regression (KPCR) method is also proposed. Moreover, a KPCR combined with GMM method is further proposed to improve the accuracy of conversion. In addition, the discontinuity and oversmoothing problems of the traditional GMM method are also addressed. On the one hand, in order to solve the discontinuity problem, the adaptive median filter is adopted to smooth the posterior probabilities. On the other hand, the two mixture components with higher posterior probabilities for each frame are chosen for VC to reduce the oversmoothing problem. Finally, the objective and subjective experiments are carried out, and the results demonstrate that the proposed approach shows greatly better performance than the GMM method. In the objective tests, the proposed method shows lower cepstral distances and higher identification rates than the GMM method. While in the subjective tests, the proposed method obtains higher scores of preference and perceptual quality.


Author(s):  
Yuki Takashima ◽  
Toru Nakashika ◽  
Tetsuya Takiguchi ◽  
Yasuo Ariki

Abstract Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-based VC has been widely researched because of the natural-sounding voice it achieves when compared with conventional Gaussian mixture model-based VC. In conventional NMF-VC, models are trained using parallel data which results in the speech data requiring elaborate pre-processing to generate parallel data. NMF-VC also tends to be an extensive model as this method has several parallel exemplars for the dictionary matrix, leading to a high computational cost. In this study, an innovative parallel dictionary-learning method using non-negative Tucker decomposition (NTD) is proposed. The proposed method uses tensor decomposition and decomposes an input observation into a set of mode matrices and one core tensor. The proposed NTD-based dictionary-learning method estimates the dictionary matrix for NMF-VC without using parallel data. The experimental results show that the proposed method outperforms other methods in both parallel and non-parallel settings.


2010 ◽  
Vol 18 (5) ◽  
pp. 954-964 ◽  
Author(s):  
Srinivas Desai ◽  
Alan W Black ◽  
B Yegnanarayana ◽  
Kishore Prahallad

Author(s):  
Srinivasan Kannan ◽  
Pooja. R. Raju ◽  
R. Sai Surya Madhav ◽  
Shikha Tripathi

2014 ◽  
Vol 1049-1050 ◽  
pp. 1622-1625
Author(s):  
Hai Miao Ge ◽  
Li Guo Wang

One-against-rest (OAR) is a well known multiclassification structure, which is an extension from binary classifiers. It has shown its great potential in pattern recognition and hyperspectral data processing. However, existence of unclassified region limits its application. In this paper, a new multiclassifier based on OAR combined with one-against-one (OAO) structure is proposed. In the multiclassifier, OAO is used to classify the unclassified region to improve performance of OAR. At the same time, the formation of unclassified region is discussed, and the pattern of selecting classifiers for secondary classification on unclassified region is proposed. To compare secondary classifiers and prove the conclusion, other six classifiers are selected , which are decision tree (DT), minimum distance (MD) based on Euclidean distance, MD based on Euclidean distance with kernel function, MD based on Mahalanobis distance, spectral mapping classifier (SMC) and maximum likelihood classifier (MLC). The SVM is used for OAR, OAO and DT in experiment and a hyperspectral remote sensing image is used as testing samples.


Sign in / Sign up

Export Citation Format

Share Document