speech synthesizer Latest Research Papers

2021 ◽

Vol 20 (No.4) ◽

pp. 489-510

Author(s):

Izzad Ramli ◽

Nursuriati Jamil ◽

Noraini Seman

Keyword(s):

Speech Synthesis ◽

Pitch Contour ◽

Prosodic Features ◽

Text To Speech ◽

High Quality ◽

Speech Synthesizer ◽

Pearson Product Moment Correlation ◽

Expressive Speech ◽

Pitch Contours ◽

Modified Pitch

Intonation generation in expressive speech such as storytelling is essential to produce high quality Malay language expressive speech synthesizer. Intonation generation, for instance explicit control, has shown good performance in terms of intelligibility with reasonably natural speech; thus, it was selected in this research. This approach modifies the prosodic features, such as pitch contour, intensity, and duration, to generate the intonation. However, modification of pitch contour remains a problem because the desired pitch contour is not achieved. This paper formulated an improved pitch contour algorithm to develop a modified pitch contour resembling the natural pitch contour. In this work, the syllable pitch contours of nine storytellers were extracted from their storytelling speeches to create an expressive speech syllable dataset called STORY_DATA. All the shapes of pitch contours from STORY_DATA were analyzed and clustered into the standard six main pitch contour clusters for storytelling. The clustering was performed using one minus the Pearson product moment correlation. Then, an improved iterative two-step sinusoidal pitch contour formulation was introduced to modify the pitch contours of a neutral speech into an expressive pitch contour of natural speeches. Overall, the improved pitch contour formulation was able to achieve 93 percent high correlated matches, indicating the high resemblance as compared to the previous pitch contour formulation at 15 percent. Therefore, the improved formula can be used in a text-to-speech (TTS) synthesizer to produce a more natural expressive speech. The paper also discovered unique expressive pitch contours in the Malay language that need further investigations in the future.

Download Full-text

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414175 ◽

2021 ◽

Author(s):

Shuhei Kato ◽

Yusuke Yasuda ◽

Xin Wang ◽

Erica Cooper ◽

Junichi Yamagishi

Keyword(s):

Speech Synthesizer

Download Full-text

An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language

Multimedia Tools and Applications ◽

10.1007/s11042-021-10771-w ◽

2021 ◽

Author(s):

Ruchika Kumari ◽

Amita Dev ◽

Ashwani Kumar

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Text To Speech ◽

Speech Synthesizer ◽

Artificial Neural ◽

Hindi Language

Download Full-text

Generating Vowel Nasality for a Rule-Based Bangla Speech Synthesizer

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/371012021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 265-270

Keyword(s):

Comparative Study ◽

Important Task ◽

Text To Speech ◽

Full Spectrum ◽

Speech Synthesizer ◽

Rule Based ◽

Partial Spectrum ◽

Nasal Vowels ◽

Speech Database ◽

Database Size

Bangla is a useful language to study nasal vowels because all the vowels have their corresponding nasal vowel counterpart. Vowel nasality generation is an important task for artificial nasality production in speech synthesizer. Various methods have been employed by many researchers for generating vowel nasality. Vowel nasality generation for a rule-basedspeech synthesizer has not been studied yet for Bangla. This study discusses several methods using full spectrum and partial spectrum for generating vowel nasality to use in a rule-basedBangla text to speech (TTS) system using demisyllable. In a demisyllable based Bangla TTS 1400 demisyllables are needed to be stored in database. Transforming the vowel part of a demisyllable into its nasal counterpart reduces the speech database size to 700 demisyllables. Comparative study of the e

Download Full-text

Design of a Multi-Condition Emotional Speech Synthesizer

Applied Sciences ◽

10.3390/app11031144 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1144

Author(s):

Sung-Woo Byun ◽

Seok-Pil Lee

Keyword(s):

Deep Learning ◽

Previous Model ◽

Emotional Expressiveness ◽

Emotional Speech ◽

Text To Speech ◽

Speech Synthesizer ◽

Synthesized Speech ◽

Proposed Model ◽

Speaking Style

Recently, researchers have developed text-to-speech models based on deep learning, which have produced results superior to those of previous approaches. However, because those systems only mimic the generic speaking style of reference audio, it is difficult to assign user-defined emotional types to synthesized speech. This paper proposes an emotional speech synthesizer constructed by embedding not only speaking styles but also emotional styles. We extend speaker embedding to multi-condition embedding by adding emotional embedding in Tacotron, so that the synthesizer can generate emotional speech. An evaluation of the results showed the superiority of the proposed model to a previous model, in terms of emotional expressiveness.

Download Full-text

A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages

10.21437/interspeech.2020-3180 ◽

2020 ◽

Author(s):

Mano Ranjith Kumar M. ◽

Sudhanshu Srivastava ◽

Anusha Prakash ◽

Hema A. Murthy

Keyword(s):

Histogram Equalization ◽

Indian Languages ◽

Text To Speech ◽

Speech Synthesizer ◽

Low Resource

Download Full-text

A Novel Algorithm for Dividing Uzbek Language Words into Syllables for Concatenative Text-to-Speech Synthesizer

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/67942020 ◽

2020 ◽

Vol 9 (4) ◽

pp. 4657-4664

Author(s):

Bakhtiyor r Akmuradov

Keyword(s):

Text To Speech ◽

Speech Synthesizer ◽

Novel Algorithm

Download Full-text

Reader: Speech Synthesizer and Speech Recognizer

Advances in Intelligent Systems and Computing - International Conference on Innovative Computing and Communications ◽

10.1007/978-981-15-5148-2_76 ◽

2020 ◽

pp. 877-886

Author(s):

Mohammad Muzammil Khan ◽

Anam Saiyeda

Keyword(s):

Speech Synthesizer ◽

Speech Recognizer

Download Full-text

Speech synthesizer produced voices for disabled, including Stephen Hawking

The Journal of the Acoustical Society of America ◽

10.1121/10.0001490 ◽

2020 ◽

Vol 148 (1) ◽

pp. R1-R2

Author(s):

Diane Kewley-Port ◽

Terrance M. Nearey

Keyword(s):

Speech Synthesizer ◽

Stephen Hawking

Download Full-text

Four-Features Evaluation of Text to Speech Systems for Three Social Robots

Electronics ◽

10.3390/electronics9020267 ◽

2020 ◽

Vol 9 (2) ◽

pp. 267

Author(s):

Fernando Alonso Martin ◽

María Malfaz ◽

Álvaro Castro-González ◽

José Carlos Castillo ◽

Miguel Ángel Salichs

Keyword(s):

Comparative Study ◽

Communication Skills ◽

Social Robotics ◽

Social Robots ◽

Human Robot Interaction ◽

Text To Speech ◽

Robot Interaction ◽

Speech Synthesizer ◽

Human Voice ◽

Research Questions

The success of social robotics is directly linked to their ability of interacting with people. Humans possess verbal and non-verbal communication skills, and, therefore, both are essential for social robots to get a natural human–robot interaction. This work focuses on the first of them since the majority of social robots implement an interaction system endowed with verbal capacities. In order to do this implementation, we must equip social robots with an artificial voice system. In robotics, a Text to Speech (TTS) system is the most common speech synthesizer technique. The performance of a speech synthesizer is mainly evaluated by its similarity to the human voice in relation to its intelligibility and expressiveness. In this paper, we present a comparative study of eight off-the-shelf TTS systems used in social robots. In order to carry out the study, 125 participants evaluated the performance of the following TTS systems: Google, Microsoft, Ivona, Loquendo, Espeak, Pico, AT&T, and Nuance. The evaluation was performed after observing videos where a social robot communicates verbally using one TTS system. The participants completed a questionnaire to rate each TTS system in relation to four features: intelligibility, expressiveness, artificiality, and suitability. In this study, four research questions were posed to determine whether it is possible to present a ranking of TTS systems in relation to each evaluated feature, or, on the contrary, there are no significant differences between them. Our study shows that participants found differences between the TTS systems evaluated in terms of intelligibility, expressiveness, and artificiality. The experiments also indicated that there was a relationship between the physical appearance of the robots (embodiment) and the suitability of TTS systems.

Download Full-text

speech synthesizer
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

An Iterated Two-Step Sinusoidal Pitch Contour Formulation for Expressive Speech Synthesis

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language

Generating Vowel Nasality for a Rule-Based Bangla Speech Synthesizer

Design of a Multi-Condition Emotional Speech Synthesizer

A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages

A Novel Algorithm for Dividing Uzbek Language Words into Syllables for Concatenative Text-to-Speech Synthesizer

Reader: Speech Synthesizer and Speech Recognizer

Speech synthesizer produced voices for disabled, including Stephen Hawking

Four-Features Evaluation of Text to Speech Systems for Three Social Robots

Export Citation Format

speech synthesizerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

An Iterated Two-Step Sinusoidal Pitch Contour Formulation for Expressive Speech Synthesis

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language

Generating Vowel Nasality for a Rule-Based Bangla Speech Synthesizer

Design of a Multi-Condition Emotional Speech Synthesizer

A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages

A Novel Algorithm for Dividing Uzbek Language Words into Syllables for Concatenative Text-to-Speech Synthesizer

Reader: Speech Synthesizer and Speech Recognizer

Speech synthesizer produced voices for disabled, including Stephen Hawking

Four-Features Evaluation of Text to Speech Systems for Three Social Robots

speech synthesizer
Recently Published Documents