Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus

Author(s):  
Tetsuya Hashimoto ◽  
Hidetsugu Uchida ◽  
Daisuke Saito ◽  
Nobuaki Minematsu
Author(s):  
Xiaohai Tian ◽  
Junchao Wang ◽  
Haihua Xu ◽  
Eng-Siong Chng ◽  
Haizhou Li

Author(s):  
Yuki Takashima ◽  
Toru Nakashika ◽  
Tetsuya Takiguchi ◽  
Yasuo Ariki

Abstract Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-based VC has been widely researched because of the natural-sounding voice it achieves when compared with conventional Gaussian mixture model-based VC. In conventional NMF-VC, models are trained using parallel data which results in the speech data requiring elaborate pre-processing to generate parallel data. NMF-VC also tends to be an extensive model as this method has several parallel exemplars for the dictionary matrix, leading to a high computational cost. In this study, an innovative parallel dictionary-learning method using non-negative Tucker decomposition (NTD) is proposed. The proposed method uses tensor decomposition and decomposes an input observation into a set of mode matrices and one core tensor. The proposed NTD-based dictionary-learning method estimates the dictionary matrix for NMF-VC without using parallel data. The experimental results show that the proposed method outperforms other methods in both parallel and non-parallel settings.


Corpora ◽  
2008 ◽  
Vol 3 (2) ◽  
pp. 213-225 ◽  
Author(s):  
Yogendra P. Yadava ◽  
Andrew Hardie ◽  
Ram Raj Lohani ◽  
Bhim N. Regmi ◽  
Srishtee Gurung ◽  
...  

In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The NNC is encoded as Unicode text and marked up in CES-compatible XML. The whole corpus is also annotated with part-of-speech tags. We describe the process of devising a tagset and retraining tagger software for the Nepali language, for which there were no existing corpus resources. Finally, we explore some present and future applications of the corpus, including lexicography, NLP, and grammatical research.


2010 ◽  
Author(s):  
Zhi-Zheng Wu ◽  
Tomi Kinnunen ◽  
Eng Siong Chng ◽  
Haizhou Li

Sign in / Sign up

Export Citation Format

Share Document