Syntactic analysis and letter-to-phoneme conversion using neural networks â€” an application of neural networks to an english text-to-speech system

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.

Download Full-text

English Text to Speech Synthesizer Using Concatenation Technique

Communications in Computer and Information Science - Advances in Computing and Data Sciences ◽

10.1007/978-981-13-1810-8_47 ◽

2018 ◽

pp. 471-480

Author(s):

Sai Sawant ◽

Mangesh Deshpande

Keyword(s):

English Text ◽

Text To Speech ◽

Speech Synthesizer

Download Full-text

Development and testing of an FPT.AI-based voicebot

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i6.2620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 2388-2395 ◽

Cited By ~ 1

Author(s):

Duc Chung Tran ◽

Duc Long Nguyen ◽

Mohd. Fadzil Hassan

Keyword(s):

English Text ◽

Communication Tool ◽

Text To Speech ◽

Module Test ◽

And Performance

In recent years, voicebot has become a popular communication tool between humans and machines. In this paper, we will introduce our voicebot integrating text-to-speech (TTS) and speech-to-text (STT) modules provided by FPT.AI. This voicebot can be considered as a critical improvement of a typical chatbot because it can respond to human’s queries by both text and speech. FPT Open Speech, LibriSpeech datasets, and music files were used to test the accuracy and performance of the STT module. For the TTS module, it was tested by using text on news pages in both Vietnamese and English. To test the voicebot, Homestay Service topic questions and off-topic messages were input to the system. The TTS module achieved 100% accuracy in the Vietnamese text test and 72.66% accuracy in the English text test. In the STT module test, the accuracy for FPT open speech dataset (Vietnamese) is 90.51% and for LibriSpeech Dataset (English) is 0% while the accuracy in music files test is 0% for both. The voicebot achieved 100% accuracy in its test. Since the FPT.AI STT and TTS modules were developed to support only Vietnamese for dominating the Vietnam market, it is reasonable that the test with LibriSpeech Dataset resulted in 0% accuracy.

Download Full-text

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis

10.21437/interspeech.2015-267 ◽

2015 ◽

Author(s):

Yuchen Fan ◽

Yao Qian ◽

Frank K. Soong ◽

Lei He

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Deep Neural Networks ◽

Text To Speech ◽

Sequence Generation ◽

Text To Speech Synthesis

Download Full-text

Morpho-Syntactic Analysis Framework for Tone Language Text-to-Speech Systems

Computer and Information Science ◽

10.5539/cis.v5n4p83 ◽

2012 ◽

Vol 5 (4) ◽

Author(s):

Moses Ekpenyong ◽

Emem Obong Udoh

Keyword(s):

Syntactic Analysis ◽

Analysis Framework ◽

Text To Speech ◽

Tone Language ◽

Language Text

Download Full-text

Exploring Efficient Neural Architectures for Linguistic–Acoustic Mapping in Text-To-Speech

Applied Sciences ◽

10.3390/app9163391 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3391 ◽

Cited By ~ 1

Author(s):

Santiago Pascual ◽

Joan Serrà ◽

Antonio Bonafonte

Keyword(s):

Neural Network ◽

Neural Networks ◽

Recurrent Neural Network ◽

Recurrent Neural Networks ◽

Affine Transformations ◽

Text To Speech ◽

Recursive Structure ◽

The One ◽

Acoustic Mapping ◽

Symbol Sequences

Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models such as recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure with intermediate affine transformations tends to make them slow to train and to sample from. In this work, we explore two different mechanisms that enhance the operational efficiency of recurrent neural networks, and study their performance–speed trade-off. The first mechanism is based on the quasi-recurrent neural network, where expensive affine transformations are removed from temporal connections and placed only on feed-forward computational directions. The second mechanism includes a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder networks are competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU and GPU inference time. The best performing model is the one based on the quasi-recurrent mechanism, reaching the same level of naturalness as the recurrent neural network based model with a speedup of 11.2 on CPU and 3.3 on GPU.

Download Full-text

Interpretation of English text to speech application to French language

2016 24th Signal Processing and Communication Application Conference (SIU) ◽

10.1109/siu.2016.7495694 ◽

2016 ◽

Author(s):

Serkan Bilecen ◽

Umut Arioz

Keyword(s):

French Language ◽

English Text ◽

Text To Speech

Download Full-text

English text to speech conversion with delta

10.1109/icassp.1986.1169266 ◽

2005 ◽

Author(s):

S. Hertz

Keyword(s):

English Text ◽

Text To Speech

Download Full-text

Arabic text to speech synthesis based on neural networks for MFCC estimation

2013 World Congress on Computer and Information Technology (WCCIT) ◽

10.1109/wccit.2013.6618665 ◽

2013 ◽

Cited By ~ 2

Author(s):

Ilyes Rebai ◽

Yassine BenAyed

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Arabic Text ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Partial Parsing

International Journal of Corpus Linguistics ◽

10.1075/ijcl.3.2.04con ◽

1998 ◽

Vol 3 (2) ◽

pp. 229-249

Author(s):

David Coniam

Keyword(s):

Computer Program ◽

English Text ◽

Syntactic Analysis ◽

Analytic Tool ◽

Partial Parsing ◽

Authentic Text ◽

The Media

This paper describes a computer program which performs a particular type of grammatical/syntactic analysis—the assigning of structural boundaries between orthographic words in written English text. The Boundary Marker has been designed, in principle, as an analyser of unrestricted text and has been developed by using, as far as possible, authentic text as data for analysis. This paper first presents a brief overview of boundary marking as a method of syntactic analysis. It then describes how the program processes text and reports on the analysis of 10 000 words of text from the media. The paper concludes with a discussion of the advantages of a tightly focused analytic tool such as the Boundary Marker.

Download Full-text

Syntactic analysis and letter-to-phoneme conversion using neural networks â an application of neural networks to an english text-to-speech system

Design of English text-to-speech conversion algorithm based on machine learning

English Text to Speech Synthesizer Using Concatenation Technique

Development and testing of an FPT.AI-based voicebot

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis

Morpho-Syntactic Analysis Framework for Tone Language Text-to-Speech Systems

Exploring Efficient Neural Architectures for Linguistic–Acoustic Mapping in Text-To-Speech

Interpretation of English text to speech application to French language

English text to speech conversion with delta

Arabic text to speech synthesis based on neural networks for MFCC estimation

Partial Parsing

Export Citation Format

Syntactic analysis and letter-to-phoneme conversion using neural networks â an application of neural networks to an english text-to-speech system

Design of English text-to-speech conversion algorithm based on machine learning

English Text to Speech Synthesizer Using Concatenation Technique

Development and testing of an FPT.AI-based voicebot

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis

Morpho-Syntactic Analysis Framework for Tone Language Text-to-Speech Systems

Exploring Efficient Neural Architectures for Linguistic–Acoustic Mapping in Text-To-Speech

Interpretation of English text to speech application to French language

English text to speech conversion with delta

Arabic text to speech synthesis based on neural networks for MFCC estimation

Partial Parsing

Syntactic analysis and letter-to-phoneme conversion using neural networks â an application of neural networks to an english text-to-speech system