English text to speech conversion with delta

Author(s):  
S. Hertz
Keyword(s):  
2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


2020 ◽  
Vol 9 (6) ◽  
pp. 2388-2395 ◽  
Author(s):  
Duc Chung Tran ◽  
Duc Long Nguyen ◽  
Mohd. Fadzil Hassan

In recent years, voicebot has become a popular communication tool between humans and machines. In this paper, we will introduce our voicebot integrating text-to-speech (TTS) and speech-to-text (STT) modules provided by FPT.AI. This voicebot can be considered as a critical improvement of a typical chatbot because it can respond to human’s queries by both text and speech. FPT Open Speech, LibriSpeech datasets, and music files were used to test the accuracy and performance of the STT module. For the TTS module, it was tested by using text on news pages in both Vietnamese and English. To test the voicebot, Homestay Service topic questions and off-topic messages were input to the system. The TTS module achieved 100% accuracy in the Vietnamese text test and 72.66% accuracy in the English text test. In the STT module test, the accuracy for FPT open speech dataset (Vietnamese) is 90.51% and for LibriSpeech Dataset (English) is 0% while the accuracy in music files test is 0% for both. The voicebot achieved 100% accuracy in its test. Since the FPT.AI STT and TTS modules were developed to support only Vietnamese for dominating the Vietnam market, it is reasonable that the test with LibriSpeech Dataset resulted in 0% accuracy.


Author(s):  
Chai Wutiwiwatchai ◽  
Ausdang Thangthai ◽  
Ananlada Chotimongkol ◽  
Chatchawarn Hansakunbuntheung ◽  
Nattanun Thatphithakkul

1995 ◽  
Vol 18 (1) ◽  
pp. 51-80 ◽  
Author(s):  
Thomas G. Dietterich ◽  
Hermann Hild ◽  
Ghulum Bakiri

Sign in / Sign up

Export Citation Format

Share Document