Integrating Rule and Template-Based Approaches to Prosody Generation for Emotional BODO Speech Synthesis

Author(s):  
Laba Kr. Thakuria ◽  
Purnendu Acharjee ◽  
Akalpita Das ◽  
P.H. Thakdar
Author(s):  
Pongsathon Janyoi ◽  
Pusadee Seresangtakul

This paper describes the Isarn speech synthesis system, which is a regional dialect spoken in the Northeast of Thailand. In this study, we focus to improve the prosody generation of the system by using the additional context features. In order to develop the system, the speech parameters (Mel-ceptrum and fundamental frequencies of phoneme within different phonetic contexts) were modelled using Hidden Markov Models (HMM). Synthetic speech was generated by converting the input text into context-dependent phonemes. Speech parameters were generated from the trained HMM, according to the context-dependent phonemes, and were then synthesized through a speech vocoder. In this study, systems were trained using three different feature sets: basic contextual features, tonal, and syllable-context features. Objective and subjective tests were conducted to determine the performance of the proposed system. The results indicated that the addition of the syllable-context features significantly improved the naturalness of synthesized speech.


Author(s):  
Thierry Dutoit ◽  
Yannis Stylianou

This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by a sequence of dictionary look-ups, and describes possible implementations of the phonetizer. Finally, the article describes prosody generation, outlining how intonation and duration can approximately be computed from text. Prosody refers to certain properties of the speech signal, which are related to audible changes in pitch, loudness, and syllable length. This article also introduces the two main existing categories of techniques for waveform generation: synthesis by rule and concatenative synthesis.


2020 ◽  
Author(s):  
Shubhi Tyagi ◽  
Marco Nicolis ◽  
Jonas Rohnke ◽  
Thomas Drugman ◽  
Jaime Lorenzo-Trueba

Author(s):  
Mumtaz Begum ◽  
Raja N. Ainon ◽  
Roziati Zainuddin ◽  
Zuraidah M. Don ◽  
Gerry Knowles

2009 ◽  
Author(s):  
Robert E. Remez ◽  
Kathryn R. Dubowski ◽  
Morgana L. Davids ◽  
Emily F. Thomas ◽  
Nina Paddu ◽  
...  
Keyword(s):  

2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


Sign in / Sign up

Export Citation Format

Share Document