scholarly journals Nepali Text to Speech Synthesis System using FreeTTS

SCITECH Nepal ◽  
2018 ◽  
Vol 13 (1) ◽  
pp. 24-31
Author(s):  
Krishna Bikram Shah ◽  
Kiran Kumar Chaudhary ◽  
Ashmita Ghimire

This paper confers the tools and methodology used in developing a Nepali Text to Speech Synthesis System using FreeTTS and is entirely developed in Java and uses FreeTTS synthesize1: Vocalized form of human communication is Speech. Here the Nepali Language is Synthetized based on formant approach and the use of one of the popular generic frameworks FreeTTS that is available in public domain for the development of a TTS system. The Text To Speech Architecture has been developed putting more emphasis on the Natural Language Processing (NLP) component rather than Digital Signal Processing (DSP) component. Nepali language being mostly used language in Nepal and some parts of India and abroad, a text-to-speech (TTS} synthesizer for this language will prove to be a convenient tool and communication technology (JCT) based system to aid to those  majorities of people who are illiterate and also to those who are physical impairments like visually handicapped and vocally disabled persons. This ability to convert text to voice may reduce the dependency, frustration, and sense of helplessness of these people. The system can be extended to include more features such as emotions, improved tokenization, interactive options and the use of minimal database.

Author(s):  
Thierry Dutoit ◽  
Yannis Stylianou

This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by a sequence of dictionary look-ups, and describes possible implementations of the phonetizer. Finally, the article describes prosody generation, outlining how intonation and duration can approximately be computed from text. Prosody refers to certain properties of the speech signal, which are related to audible changes in pitch, loudness, and syllable length. This article also introduces the two main existing categories of techniques for waveform generation: synthesis by rule and concatenative synthesis.


Author(s):  
Jesin James ◽  
Isabella Shields ◽  
Rebekah Berriman ◽  
Peter J. Keegan ◽  
Catherine I. Watson

Author(s):  
Mahbubur R. Syed ◽  
Shuvro Chakrobartty ◽  
Robert J. Bignall

Speech synthesis is the process of producing natural-sounding, highly intelligible synthetic speech simulated by a machine in such a way that it sounds as if it was produced by a human vocal system. A text-to-speech (TTS) synthesis system is a computer-based system where the input is text and the output is a simulated vocalization of that text. Before the 1970s, most speech synthesis was achieved with hardware, but this was costly and it proved impossible to properly simulate natural speech production. Since the 1970s, the use of computers has made the practical application of speech synthesis more feasible.


Sign in / Sign up

Export Citation Format

Share Document