Generating the Voice of the Interactive Virtual Assistant

Mapping Intimacies ◽

10.5772/intechopen.95510 ◽

2021 ◽

Author(s):

Adriana Stan ◽

Beáta Lőrincz

Keyword(s):

Speech Synthesis ◽

Text Processing ◽

Research Field ◽

Text To Speech ◽

Rule Based ◽

Acoustic Modelling ◽

Research Problems ◽

Text To Speech Synthesis ◽

Main Components ◽

The Voice

This chapter introduces an overview of the current approaches for generating spoken content using text-to-speech synthesis (TTS) systems, and thus the voice of an Interactive Virtual Assistant (IVA). The overview builds upon the issues which make spoken content generation a non-trivial task, and introduces the two main components of a TTS system: text processing and acoustic modelling. It then focuses on providing the reader with the minimally required scientific details of the terminology and methods involved in speech synthesis, yet with sufficient knowledge so as to be able to make the initial decisions regarding the choice of technology for the vocal identity of the IVA. The speech synthesis methodologies’ description begins with the basic, easy to run, low-requirement rule-based synthesis, and ends up within the state-of-the-art deep learning landscape. To bring this extremely complex and extensive research field closer to commercial deployment, an extensive indexing of the readily and freely available resources and tools required to build a TTS system is provided. Quality evaluation methods and open research problems are, as well, highlighted at end of the chapter.

Download Full-text

A rule-based phrase parser for real-time text-to-speech synthesis

Natural Language Engineering ◽

10.1017/s1351324900000140 ◽

1995 ◽

Vol 1 (2) ◽

pp. 191-212 ◽

Cited By ~ 1

Author(s):

Joan Bachenko ◽

Eileen Fitzpatrick ◽

Jeffrey Daugherty

Keyword(s):

Real Time ◽

Hard Of Hearing ◽

Speech Synthesis ◽

Break Point ◽

Linguistic Context ◽

Text To Speech ◽

Rule Based ◽

Front End ◽

Text To Speech Synthesis ◽

Break Points

AbstractText-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.

Download Full-text

Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis

10.21437/interspeech.2019-1418 ◽

2019 ◽

Author(s):

Bing Yang ◽

Jiaqi Zhong ◽

Shan Liu

Keyword(s):

Speech Synthesis ◽

Text Processing ◽

Text To Speech ◽

Front End ◽

Text To Speech Synthesis

Download Full-text

Deep Syntactic Analysis and Rule Based Accentuation in Text-to-Speech Synthesis

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-540-87391-4_68 ◽

2008 ◽

pp. 535-542 ◽

Cited By ~ 1

Author(s):

Antti Suni ◽

Martti Vainio

Keyword(s):

Speech Synthesis ◽

Syntactic Analysis ◽

Text To Speech ◽

Rule Based ◽

Text To Speech Synthesis

Download Full-text

Text Normalization for Telugu Text-to-Speech Synthesis

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v11i2.1176 ◽

2013 ◽

Vol 11 (2) ◽

pp. 2241-2249

Author(s):

Dr. K.V.N. Sunitha ◽

P.Sunitha Devi

Keyword(s):

Speech Synthesis ◽

Text Processing ◽

Text To Speech ◽

Speech Technology ◽

Rule Based System ◽

Input Text ◽

Novel Approach ◽

Text To Speech Synthesis ◽

Processing Component ◽

Text Normalization

Most areas related to language and speech technology, directly or indirectly, require handling of unrestricted text, and Text-to-speech systems directly need to work on real text. To build a natural sounding speech synthesis system, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. A novel approach is used, where the input text is tokenized, and classification is done based on token type. The token sense disambiguation is achieved by the semantic nature of the language and then the expansion rules are applied to get the normalized text. However, for Telugu language not much work is done on text normalization. In this paper we discuss our efforts for designing a rule based system to achieve text normalization in the context of building Telugu text-to-speech system.

Download Full-text

Towards designing a high intelligibility rule based standard malay text-to-speech synthesis system

2008 International Conference on Computer and Communication Engineering ◽

10.1109/iccce.2008.4580574 ◽

2008 ◽

Cited By ~ 3

Author(s):

Zakiah Hanim Ahmad ◽

Othman Khalifa

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Synthesis System ◽

Rule Based ◽

Text To Speech Synthesis

Download Full-text

A rule based perceptual intonation model for Turkish text-to-speech synthesis

2012 20th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu.2012.6204475 ◽

2012 ◽

Cited By ~ 1

Author(s):

Ibrahim Baran Uslu ◽

Hakki Gokhan Ilk

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Rule Based ◽

Text To Speech Synthesis ◽

Turkish Text

Download Full-text

Text-To-Speech Synthesis Using Transfer Learning

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-956 ◽

2021 ◽

pp. 139-144

Author(s):

Ishita Satija ◽

Vina Lomte ◽

Yash Wani ◽

Digisha Kaneria ◽

Shubham Yadav

Keyword(s):

Transfer Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Neural Organization ◽

Proposed Model ◽

Backward Wave ◽

Text To Speech Synthesis ◽

The Voice

We portray a neural organization based framework for text-to-speech (TTS) combination that can create discourse sound in the voice of various speakers, including those concealed during preparation. Our framework comprises of three autonomously prepared parts: (1) a speaker encoder network; (2) a grouping to-succession union organization based on Tacotron 2; (3) an auto-backward Wave Net-based vocoder network. We illustrate that the proposed model can move the information on speaker fluctuation learned by the discriminatively-prepared speaker encoder to the multi speaker TTS task, and can incorporate normal discourse from speakers concealed during preparation. We measure the significance of preparing the speaker encoder on a huge and different speaker set to acquire the best speculation execution. At last, we show that haphazardly inspected speaker embeddings can be utilized to integrate discourse in the voice of novel speakers divergent from those utilized in preparing, showing that the model has taken in a top-notch speaker portrayal.

Download Full-text