Korean text-to-speech system using a formant synthesis method.

1992 ◽  
Vol 13 (3) ◽  
pp. 151-160
Author(s):  
Seung-Kwon Ahn ◽  
Koeng-Mo Sung
2018 ◽  
Vol 10 (1) ◽  
pp. 39-48 ◽  
Author(s):  
Yeunju Choi ◽  
Youngmoon Jung ◽  
Younggwan Kim ◽  
Youngjoo Suh ◽  
Hoirin Kim

Gipan ◽  
2019 ◽  
Vol 4 ◽  
pp. 106-116
Author(s):  
Roop Shree Ratna Bajracharya ◽  
Santosh Regmi ◽  
Bal Krishna Bal ◽  
Balaram Prasain

Text-to-Speech (TTS) synthesis has come far from its primitive synthetic monotone voices to more natural and intelligible sounding voices. One of the direct applications of a natural sounding TTS systems is the screen reader applications for the visually impaired and the blind community. The Festival Speech Synthesis System uses a concatenative speech synthesis method together with the unit selection process to generate a natural sounding voice. This work primarily gives an account of the efforts put towards developing a Natural sounding TTS system for Nepali using the Festival system. We also shed light on the issues faced and the solutions derived which can be quite overlapping across other similar under-resourced languages in the region.


Author(s):  
Thierry Dutoit ◽  
Yannis Stylianou

Text-to-speech (TTS) synthesis is the art of designing talking machines. Seen from this functional perspective, the task looks simple, but this chapter shows that delivering intelligible, natural-sounding, and expressive speech, while also taking into account engineering costs, is a real challenge. Speech synthesis has made a long journey from the big controversy in the 1980s, between MIT’s formant synthesis and Bell Labs’ diphone-based concatenative synthesis. While unit selection technology, which appeared in the mid-1990s, can be seen as an extension of diphone-based approaches, the appearance of Hidden Markov Models (HMM) synthesis around 2005 resulted in a major shift back to models. More recently, the statistical approaches, supported by advanced deep learning architectures, have been shown to advance text analysis and normalization as well as the generation of the waveforms. Important recent milestones have been Google’s Wavenet (September 2016) and the sequence-to-sequence models referred to as Tacotron (I and II).


2013 ◽  
Vol 303-306 ◽  
pp. 1334-1337
Author(s):  
Zhi Ping Zhang ◽  
Xi Hong Wu

The authors proposed a trainable formant synthesis method based on the multi-channel Hidden Trajectory Model (HTM). In the method, the phonetic targets, formant trajectories and spectrum states from the oral, nasal, voiceless and background channels were designed to construct hierarchical hidden layers, and then spectrum were generated as observable features. In model training, the phonemic targets were learned from one-hour training speech data and the boundaries of phonemes were also aligned. The experimental results showed that the speech could be reconstructed with the formant trainable model by a source-filter synthesizer.


Sign in / Sign up

Export Citation Format

Share Document