Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain.

Download Full-text

Excitation modeling for HMM-based speech synthesis: Breaking down the impact of periodic and aperiodic components

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2014.6853598 ◽

2014 ◽

Cited By ~ 11

Author(s):

Thomas Drugman ◽

Tuomo Raitio

Keyword(s):

Speech Synthesis ◽

The Impact

Download Full-text

Phonetic alignment: speech synthesis-based vs. Viterbi-based

Speech Communication ◽

10.1016/s0167-6393(02)00131-0 ◽

2003 ◽

Vol 40 (4) ◽

pp. 503-515 ◽

Cited By ~ 31

Author(s):

F. Malfrère ◽

O. Deroo ◽

T. Dutoit ◽

C. Ris

Keyword(s):

Speech Synthesis ◽

Phonetic Alignment

Download Full-text

On the impact of phoneme alignment in DNN-based speech synthesis

10.21437/ssw.2016-32 ◽

2016 ◽

Cited By ~ 1

Author(s):

Mei Li ◽

Zhizheng Wu ◽

Lei Xie

Keyword(s):

Speech Synthesis ◽

The Impact

Download Full-text

On the impact of labialization contexts on unit selection speech synthesis

2012 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) ◽

10.1109/isspit.2012.6621284 ◽

2012 ◽

Author(s):

Daniel Tihelka ◽

Zdenek Hanzlicek ◽

Pavel Machac ◽

Radek Skarnitzl ◽

Jindrich Matousek

Keyword(s):

Speech Synthesis ◽

Unit Selection ◽

The Impact

Download Full-text

On the Impact of Annotation Errors on Unit-Selection Speech Synthesis

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32790-2_55 ◽

2012 ◽

pp. 456-463 ◽

Cited By ~ 6

Author(s):

Jindřich Matoušek ◽

Daniel Tihelka ◽

Luboš Šmídl

Keyword(s):

Speech Synthesis ◽

Unit Selection ◽

The Impact ◽

Annotation Errors

Download Full-text

Student Perception of Traditional English Teaching Methods (CLT approach) and Comparison to Modern Methods (Using Technology)

International Journal of Education and Information Technologies ◽

10.46300/9109.2021.15.5 ◽

2021 ◽

Vol 15 ◽

pp. 35-43

Author(s):

Feras Mohammed AL-Madani

Keyword(s):

Language Learning ◽

Teaching Methods ◽

Speech Synthesis ◽

English Language ◽

Language Teaching ◽

Student Perception ◽

English Language Teaching ◽

Computer Assisted ◽

Before And After ◽

The Impact

This study aimed to assess the perception of students regarding traditionally used CLT approach for teaching English language and its comparison to modern teaching methods based on technology. Survey was carried out using a quantitative analysis on 200 students of English language teaching institutes that are currently using CLT approach. Pre and post responses survey was carried out wherein their perspectives were assessed before and after exposure to technology-based ELT methods. Analysis was carried out using Wilcoxin test which revealed the impact of modern technological tools used in language teaching, such as, video conferencing, audio CDs, online oral versions, text-to-speech synthesis, interactive books, digital game-based learning and computer assisted language learning (CALL).

Download Full-text

Improving the Accuracy of the Speech Synthesis Based Phonetic Alignment Using Multiple Acoustic Features

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/3-540-45011-4_5 ◽

2003 ◽

pp. 31-39 ◽

Cited By ~ 2

Author(s):

Sérgio Paulo ◽

Luís C. Oliveira

Keyword(s):

Speech Synthesis ◽

Acoustic Features ◽

Phonetic Alignment

Download Full-text

Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion

Applied Sciences ◽

10.3390/app11199056 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9056

Author(s):

Guolun Sun ◽

Zhihua Huang ◽

Li Wang ◽

Pengyuan Zhang

Keyword(s):

Speech Synthesis ◽

Short Term Memory ◽

Pearson Correlation ◽

Target Function ◽

Model Parameters ◽

Acoustic Feature ◽

Inversion Model ◽

Articulatory Features ◽

Articulatory Inversion ◽

The Impact

Articulatory features are proved to be efficient in the area of speech recognition and speech synthesis. However, acquiring articulatory features has always been a difficult research hotspot. A lightweight and accurate articulatory model is of significant meaning. In this study, we propose a novel temporal convolution network-based acoustic-to-articulatory inversion system. The acoustic feature is converted into a high-dimensional hidden space feature map through temporal convolution with frame-level feature correlations taken into account. Meanwhile, we construct a two-part target function combining prediction’s Root Mean Square Error (RMSE) and the sequences’ Pearson Correlation Coefficient (PCC) to jointly optimize the performance of the specific inversion model from both aspects. We also further conducted an analysis on the impact of the weight between the two parts on the final performance of the inversion model. Extensive experiments have shown that our, temporal convolution networks (TCN) model outperformed the Bi-derectional Long Short Term Memory model by 1.18 mm in RMSE and 0.845 in PCC with 14 model parameters when optimizing evenly with RMSE and PCC aspects.

Download Full-text

The impact of speech recognition on speech synthesis

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. ◽

10.1109/wss.2002.1224382 ◽

2004 ◽

Cited By ~ 10

Author(s):

M. Ostendorf ◽

I. Bulyko

Keyword(s):

Speech Recognition ◽

Speech Synthesis ◽

The Impact

Download Full-text