scholarly journals Selection of Training Data for HMM-based Speech Synthesis from Prosodic Features - Use of Generation Process Model of Fundamental Frequency Contours

Author(s):  
Tomoyuki Mizukami ◽  
Hiroya Hashimoto ◽  
Keikichi Hirose ◽  
Daisuke Saito ◽  
Nobuaki Minematsu
2011 ◽  
Author(s):  
Keikichi Hirose ◽  
Keiko Ochi ◽  
Ryusuke Mihara ◽  
Hiroya Hashimoto ◽  
Daisuke Saito ◽  
...  

Author(s):  
Keikichi Hirose

After starting as an effort to mimic the human process of speech sound generation, the quality of synthetic speech has reached a level that makes it difficult to notice that it is synthetic. This owes to the development of waveform concatenation methods which select the most appropriate speech segments from a huge speech corpus. Although the lack of flexibility in producing various speech qualities/styles has been pointed out, this problem is about to be solved by introducing statistical frameworks into parametric speech synthesis. Now, a speaker can even speak a foreign language in his/her voice using advanced voice-conversion techniques. However, if we consider prosodic features of speech, current technologies are not appropriate to handle their hierarchical structure over a long time span. Introduction of prosody modelling into the speech-synthesis process is necessary. In this chapter, after viewing the history of voice/speech synthesis, technologies are explained, starting from text-to-speech and concept-to-speech conversion. Then, methods of sound generation are introduced. Statistical parametric speech synthesis, especially HMM-based speech synthesis, is introduced as a technology that enables flexible speech synthesis—that is, synthetic speech with various qualities/styles requiring a smaller amount of speech corpus. After that, the problem of frame-by-frame processing for prosodic features is addressed and the importance of prosody modelling is pointed out. Prosodic (fundamental frequency) modelling is surveyed and, finally, the generation process model is introduced with some experimental results when applied to HMM-based speech synthesis.


Sign in / Sign up

Export Citation Format

Share Document