Thai speech synthesis with emotional tone: Based on Formant synthesis for Home Robot

Text-to-speech (TTS) synthesis is the art of designing talking machines. Seen from this functional perspective, the task looks simple, but this chapter shows that delivering intelligible, natural-sounding, and expressive speech, while also taking into account engineering costs, is a real challenge. Speech synthesis has made a long journey from the big controversy in the 1980s, between MIT’s formant synthesis and Bell Labs’ diphone-based concatenative synthesis. While unit selection technology, which appeared in the mid-1990s, can be seen as an extension of diphone-based approaches, the appearance of Hidden Markov Models (HMM) synthesis around 2005 resulted in a major shift back to models. More recently, the statistical approaches, supported by advanced deep learning architectures, have been shown to advance text analysis and normalization as well as the generation of the waveforms. Important recent milestones have been Google’s Wavenet (September 2016) and the sequence-to-sequence models referred to as Tacotron (I and II).

Download Full-text

Formant Speech Synthesis Based on Trainable Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1334 ◽

2013 ◽

Vol 303-306 ◽

pp. 1334-1337

Author(s):

Zhi Ping Zhang ◽

Xi Hong Wu

Keyword(s):

Speech Synthesis ◽

Synthesis Method ◽

Experimental Results ◽

Trajectory Model ◽

Formant Synthesis ◽

Speech Data ◽

Model Training

The authors proposed a trainable formant synthesis method based on the multi-channel Hidden Trajectory Model (HTM). In the method, the phonetic targets, formant trajectories and spectrum states from the oral, nasal, voiceless and background channels were designed to construct hierarchical hidden layers, and then spectrum were generated as observable features. In model training, the phonemic targets were learned from one-hour training speech data and the boundaries of phonemes were also aligned. The experimental results showed that the speech could be reconstructed with the formant trainable model by a source-filter synthesizer.

Download Full-text

Integration of rule-based formant synthesis and waveform concatenation: a hybrid approach to text-to-speech synthesis

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. ◽

10.1109/wss.2002.1224379 ◽

2004 ◽

Cited By ~ 6

Author(s):

S.R. Hertz

Keyword(s):

Speech Synthesis ◽

Hybrid Approach ◽

Text To Speech ◽

Rule Based ◽

Formant Synthesis ◽

Text To Speech Synthesis

Download Full-text

Modern speech synthesis for phonetic sciences: a discussion and an evaluation

10.31234/osf.io/dxvhc ◽

2020 ◽

Author(s):

Zofia Malisz ◽

Gustav Eje Henter ◽

Cassia Valentini-Botinhao ◽

Oliver Watts ◽

Jonas Beskow ◽

...

Keyword(s):

Speech Synthesis ◽

State Of The Art ◽

Reaction Times ◽

Natural Speech ◽

Decision Task ◽

Synthesis Reaction ◽

Text To Speech ◽

Rule Based ◽

Quantum Leap ◽

Formant Synthesis

Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for speech experiments. In this article, we discuss these and other implications for phonetic sciences. We substantiate our argument by evaluating classic rule-based formant synthesis against state-of-the-art synthesisers on a) subjective naturalness ratings and b) a behavioural measure (reaction times in a lexical decision task). We also differentiate between text-to-speech and speech-to-speech methods. Naturalness ratings indicate that all modern systems are substantially closer to natural speech than formant synthesis. Reaction times for several modern systems do not differ substantially from natural speech, meaning that the processing gap observed in older systems, and reproduced with our formant synthesiser, is no longer evident. Importantly, some speech-to-speech methods are nearly indistinguishable from natural speech on both measures.

Download Full-text

Development of a Mobile Home Robot System Based on RECS Concept and its Application to Setting and Clearing the Table

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2007.p0646 ◽

2007 ◽

Vol 19 (6) ◽

pp. 646-655 ◽

Cited By ~ 2

Author(s):

Seiji Aoyagi ◽

◽

Takahiro Yamaguchi ◽

Kazuo Tsunemine ◽

Hiroshi Kinomoto ◽

...

Keyword(s):

Speech Synthesis ◽

Social Needs ◽

Mobile Home ◽

Robot System ◽

Human Interface ◽

Indoor Mobile Robot ◽

Docking Mechanism ◽

Home Robot ◽

Robot Performance ◽

Robot Tasks

A multipurpose robot conducting domestic tasks should be indispensable for social needs, and this type of robot requires sophisticated technologies. A humanoid robot is not really practical at present for actual home or hospital use, considering its reliability and cost. To develop a practical multipurpose robot, we previously proposed the robot-environment compromise system (RECS) concept, which involves technology to modify a robot’s environment to increase robot performance. This concept aims to share the technical difficulties between the robot and the environment so that robot tasks are possible and facilitated. The present paper reports the development of an indoor mobile robot system based on the RECS concept that has a wheel mechanism to traverse steps. We propose a navigation system based on image recognition of landmarks on the ceiling and evaluated its effectiveness in experiments. We also propose a positioning system using a docking mechanism. We demonstrate our proposal’s feasibility using domestic tasks of setting a meal on a table and clearing away the dishes. We also developed a human interface system based on speech synthesis and recognition.

Download Full-text

Determining Emotional Tone and Verbal Behavior in Patients With Tinnitus and Hyperacusis: An Exploratory Mixed-Methods Study

American Journal of Audiology ◽

10.1044/2019_aja-18-0136 ◽

2019 ◽

Vol 28 (3) ◽

pp. 660-672

Author(s):

Suzanne H. Kimball ◽

Toby Hamilton ◽

Erin Benear ◽

Jonathan Baldwin

Keyword(s):

Mixed Methods ◽

Verbal Behavior ◽

Coping Strategies ◽

Mixed Methods Study ◽

Self Report ◽

Anxiety And Depression ◽

Negative Consequences ◽

Emotional Tone ◽

Explanatory Mixed Methods ◽

Quantitative Results

Purpose The purpose of this study was to evaluate the emotional tone and verbal behavior of social media users who self-identified as having tinnitus and/or hyperacusis that caused self-described negative consequences on daily life or health. Research Design and Method An explanatory mixed-methods design was utilized. Two hundred “initial” and 200 “reply” Facebook posts were collected from members of a tinnitus group and a hyperacusis group. Data were analyzed via the LIWC 2015 software program and compared to typical bloggers. As this was an explanatory mixed-methods study, we used qualitative thematic analyses to explain, interpret, and illustrate the quantitative results. Results Overall, quantitative results indicated lower overall emotional tone for all categories (tinnitus and hyperacusis, initial and reply), which was mostly influenced by higher negative emotion. Higher levels of authenticity or truth were found in the hyperacusis sample but not in the tinnitus sample. Lower levels of clout (social standing) were indicated in all groups, and a lower level of analytical thinking style (concepts and complex categories rather than narratives) was found in the hyperacusis sample. Additional analysis of the language indicated higher levels of sadness and anxiety in all groups and lower levels of anger, particularly for initial replies. These data support prior findings indicating higher levels of anxiety and depression in this patient population based on the actual words in blog posts and not from self-report questionnaires. Qualitative results identified 3 major themes from both the tinnitus and hyperacusis texts: suffering, negative emotional tone, and coping strategies. Conclusions Results from this study suggest support for the predominant clinical view that patients with tinnitus and hyperacusis have higher levels of anxiety and depression than the general population. The extent of the suffering described and patterns of coping strategies suggest clinical practice patterns and the need for research in implementing improved practice plans.

Download Full-text

Speech synthesis from natural models by hand and by algorithm

PsycEXTRA Dataset ◽

10.1037/e520562012-289 ◽

2009 ◽

Author(s):

Robert E. Remez ◽

Kathryn R. Dubowski ◽

Morgana L. Davids ◽

Emily F. Thomas ◽

Nina Paddu ◽

...

Keyword(s):

Speech Synthesis

Download Full-text

Examining variation in emotional tone of voice on listeners' perception of spoken words

PsycEXTRA Dataset ◽

10.1037/e520592012-314 ◽

2010 ◽

Author(s):

Maura L. Wilson ◽

Conor T. McLennan

Keyword(s):

Emotional Tone ◽

Spoken Words ◽

Tone Of Voice

Download Full-text

Design of English text-to-speech conversion algorithm based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189238 ◽

2020 ◽

pp. 1-12

Author(s):

Li Dongmei

Keyword(s):

Machine Learning ◽

Speech Synthesis ◽

Feature Recognition ◽

Learning Algorithm ◽

Morphological Structure ◽

English Text ◽

Text To Speech ◽

Part Of Speech ◽

Modern Computer ◽

Conversion Algorithm

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.

Download Full-text

Thai speech synthesis with emotional tone: Based on Formant synthesis for Home Robot

Thai speech synthesis based on Formant synthesis for home robot

Text-to-Speech Synthesis

Formant Speech Synthesis Based on Trainable Model

Integration of rule-based formant synthesis and waveform concatenation: a hybrid approach to text-to-speech synthesis

Modern speech synthesis for phonetic sciences: a discussion and an evaluation

Development of a Mobile Home Robot System Based on RECS Concept and its Application to Setting and Clearing the Table

Determining Emotional Tone and Verbal Behavior in Patients With Tinnitus and Hyperacusis: An Exploratory Mixed-Methods Study

Speech synthesis from natural models by hand and by algorithm

Examining variation in emotional tone of voice on listeners' perception of spoken words

Design of English text-to-speech conversion algorithm based on machine learning

Export Citation Format