Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

Author(s):  
Marc Evrard ◽  
Albert Rilliard ◽  
Christophe d’Alessandro
2018 ◽  
Vol 7 (2.28) ◽  
pp. 234 ◽  
Author(s):  
Karolina Kuligowska ◽  
Paweł Kisielewicz ◽  
Aleksandra Włodarz

The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain. 


2003 ◽  
Vol 40 (4) ◽  
pp. 503-515 ◽  
Author(s):  
F. Malfrère ◽  
O. Deroo ◽  
T. Dutoit ◽  
C. Ris

Author(s):  
Feras Mohammed AL-Madani

This study aimed to assess the perception of students regarding traditionally used CLT approach for teaching English language and its comparison to modern teaching methods based on technology. Survey was carried out using a quantitative analysis on 200 students of English language teaching institutes that are currently using CLT approach. Pre and post responses survey was carried out wherein their perspectives were assessed before and after exposure to technology-based ELT methods. Analysis was carried out using Wilcoxin test which revealed the impact of modern technological tools used in language teaching, such as, video conferencing, audio CDs, online oral versions, text-to-speech synthesis, interactive books, digital game-based learning and computer assisted language learning (CALL).


2021 ◽  
Vol 11 (19) ◽  
pp. 9056
Author(s):  
Guolun Sun ◽  
Zhihua Huang ◽  
Li Wang ◽  
Pengyuan Zhang

Articulatory features are proved to be efficient in the area of speech recognition and speech synthesis. However, acquiring articulatory features has always been a difficult research hotspot. A lightweight and accurate articulatory model is of significant meaning. In this study, we propose a novel temporal convolution network-based acoustic-to-articulatory inversion system. The acoustic feature is converted into a high-dimensional hidden space feature map through temporal convolution with frame-level feature correlations taken into account. Meanwhile, we construct a two-part target function combining prediction’s Root Mean Square Error (RMSE) and the sequences’ Pearson Correlation Coefficient (PCC) to jointly optimize the performance of the specific inversion model from both aspects. We also further conducted an analysis on the impact of the weight between the two parts on the final performance of the inversion model. Extensive experiments have shown that our, temporal convolution networks (TCN) model outperformed the Bi-derectional Long Short Term Memory model by 1.18 mm in RMSE and 0.845 in PCC with 14 model parameters when optimizing evenly with RMSE and PCC aspects.


Sign in / Sign up

Export Citation Format

Share Document