The Theory behind Controllable Expressive Speech Synthesis: A Cross-Disciplinary Approach

Human 4.0 - From Biology to Cybernetic ◽

10.5772/intechopen.89849 ◽

2021 ◽

Cited By ~ 1

Author(s):

Noé Tits ◽

Kevin El Haddad ◽

Thierry Dutoit

Keyword(s):

Recurrent Neural Networks ◽

Speech Synthesis ◽

Text To Speech ◽

Expressive Speech ◽

Rich Domain ◽

Interaction Field ◽

Audio Features ◽

History Of ◽

Text To Speech Synthesis ◽

Statistical Parametric Speech Synthesis

As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, and psychology. In this chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the chapter intends to assemble the different aspects of the theory and summarize the concepts.

Download Full-text

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks

10.21437/interspeech.2016-159 ◽

2016 ◽

Cited By ~ 10

Author(s):

Cassia Valentini-Botinhao ◽

Xin Wang ◽

Shinji Takaki ◽

Junichi Yamagishi

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Recurrent Neural Networks ◽

Speech Synthesis ◽

Text To Speech ◽

Synthesis System ◽

Text To Speech Synthesis ◽

Noise Robust

Download Full-text

Text to Speech Synthesis System for Punjabi language using Statistical Parametric Speech Synthesis Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1042.0789s19 ◽

2019 ◽

Vol 8 (9S) ◽

pp. 268-272

Keyword(s):

Speech Synthesis ◽

Gaussian Mixture ◽

Indian Languages ◽

Text To Speech ◽

Synthesis Technique ◽

Text To Speech Synthesis ◽

Statistical Parametric Speech Synthesis ◽

Parametric Speech Synthesis ◽

Traditional Approaches

Statistical Parametric Speech Synthesis has been most growing technique rather than the traditional approaches that we are used to synthesizing the speech. The shortcoming of traditional approaches will be overcome with latest statistical techniques. The main advantages of SPSS from traditional synthesis technique are that it has more flexibility to change the characteristics of voice and support more multiple languages i.e. multilingual, has good coverage of acoustic ` and robustness. It generates high quality of speech from small training database. Deep Neural network and Hidden Morkov model are basic statistical parametric speech synthesis techniques. Gaussian mixture model, sinusoidal model are also under this categories. Features were extracted in two type spectral features like spectral bandwidth, spectral centroid etc. and excitation features like F0 frequencies etc. We are using 722 Punjabi phonemes. Using sound forge software we extracted the 200 wave file from 1 hour pre-recording wave file related to those phonemes. Each and every phonemes feature was extracted and saved in database. We were extracting 28 features of each phoneme. TTS text-to-speech system generates sounds or speech as a output when provided the text of Punjabi language. There were already many TTS are developed on different Indian languages. The system that we are trying to build is based only on Punjabi language.

Download Full-text

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9053678 ◽

2020 ◽

Author(s):

Vatsal Aggarwal ◽

Marius Cotescu ◽

Nishant Prateek ◽

Jaime Lorenzo-Trueba ◽

Roberto Barra-Chicote

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Expressive Speech ◽

Text To Speech Synthesis

Download Full-text

Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639626 ◽

2018 ◽

Author(s):

Viacheslav Klimkov ◽

Alexis Moinet ◽

Adam Nadolski ◽

Thomas Drugman

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

10.21437/interspeech.2017-1762 ◽

2017 ◽

Cited By ~ 1

Author(s):

Beiming Cao ◽

Myungjong Kim ◽

Jan van Santen ◽

Ted Mau ◽

Jun Wang

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis

10.21437/ssw.2019-37 ◽

2019 ◽

Author(s):

Elshadai Tesfaye Biru ◽

Yishak Tofik Mohammed ◽

David Tofu ◽

Erica Cooper ◽

Julia Hirschberg

Keyword(s):

Speech Synthesis ◽

Subset Selection ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Prosody Prediction

Download Full-text

“I Can’t Talk Now”: Speaking with Voice Output Communication Aid Using Text-to-Speech Synthesis During Multiparty Video Conference

Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411763.3451745 ◽

2021 ◽

Author(s):

Wooseok Kim ◽

Sangsu Lee

Keyword(s):

Speech Synthesis ◽

Video Conference ◽

Text To Speech ◽

Voice Output Communication Aid ◽

Communication Aid ◽

Text To Speech Synthesis ◽

Voice Output

Download Full-text

Comparative Study on Neural Vocoders for Multispeaker Text-To-Speech Synthesis

2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) ◽

10.1109/raics51191.2020.9332514 ◽

2020 ◽

Author(s):

Rajeev Rajan ◽

Ashish Roopan ◽

Sachin Prakash ◽

Elisa Jose ◽

Sati P.

Keyword(s):

Comparative Study ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Comparison of Urdu text to speech synthesis using unit selection and HMM based techniques

2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) ◽

10.1109/icsda.2016.7918988 ◽

2016 ◽

Cited By ~ 1

Author(s):

Farah Adeeba ◽

Tania Habib ◽

Sarmad Hussain ◽

Ehsan-ul-haq ◽

Kh. Shahzada Shahid

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Unit Selection ◽

Text To Speech Synthesis

Download Full-text

Comparative study of text-to-speech synthesis techniques for mobile linguistic translation process

2014 IEEE International Conference on Control System, Computing and Engineering (ICCSCE 2014) ◽

10.1109/iccsce.2014.7072761 ◽

2014 ◽

Author(s):

Phanchita Chomwihoke ◽

Manop Phankokkruad

Keyword(s):

Comparative Study ◽

Speech Synthesis ◽

Text To Speech ◽

Translation Process ◽

Synthesis Techniques ◽

Text To Speech Synthesis

Download Full-text