prosody prediction
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 3)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Vol 11 (19) ◽  
pp. 9010
Author(s):  
Feiyu Shen ◽  
Chenpeng Du ◽  
Kai Yu

The most recent end-to-end speech synthesis systems use phonemes as acoustic input tokens and ignore the information about which word the phonemes come from. However, many words have their specific prosody type, which may significantly affect the naturalness. Prior works have employed pre-trained linguistic word embeddings as TTS system input. However, since linguistic information is not directly relevant to how words are pronounced, TTS quality improvement of these systems is mild. In this paper, we propose a novel and effective way of jointly training acoustic phone and word embeddings for end-to-end TTS systems. Experiments on the LJSpeech dataset show that the acoustic word embeddings dramatically decrease both the training and validation loss in phone-level prosody prediction. Subjective evaluations on naturalness demonstrate that the incorporation of acoustic word embeddings can significantly outperform both pure phone-based system and the TTS system with pre-trained linguistic word embedding.


2019 ◽  
Author(s):  
Elshadai Tesfaye Biru ◽  
Yishak Tofik Mohammed ◽  
David Tofu ◽  
Erica Cooper ◽  
Julia Hirschberg

2019 ◽  
Author(s):  
Rose Sloan ◽  
Syed Sarfaraz Akhtar ◽  
Bryan Li ◽  
Ritvik Shrivastava ◽  
Agustin Gravano ◽  
...  

Author(s):  
Vaibhavi Rajendran ◽  
G Bharadwaja Kumar

A speech synthesizer which sounds similar to a human voice is preferred over a robotic voice, and hence to increase the naturalness of a speech synthesizer an efficacious prosody model is imperative. Hence, this paper is focused on developing a prosody prediction model using sentiment analysis for a Tamil speech synthesizer. Two variations of prosody prediction models using SentiWordNet are experimented: one without a stemmer and the other with a stemmer. The prosody prediction model with a stemmer performs much more efficiently than the one without a stemmer as it tackles the highly agglutinative and inflectional words in Tamil language in a better way and is exemplified clearly, in this paper. The performance of the prosody prediction model with a stemmer has a higher classification accuracy of 77% on the test set in comparison to the 57% accuracy by the prosody model without a stemmer. 


Sign in / Sign up

Export Citation Format

Share Document