scholarly journals Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System

Author(s):  
Bajibabu Bollepalli ◽  
Lauri Juvela ◽  
Paavo Alku
Author(s):  
Ishita Satija ◽  
Vina Lomte ◽  
Yash Wani ◽  
Digisha Kaneria ◽  
Shubham Yadav

We portray a neural organization based framework for text-to-speech (TTS) combination that can create discourse sound in the voice of various speakers, including those concealed during preparation. Our framework comprises of three autonomously prepared parts: (1) a speaker encoder network; (2) a grouping to-succession union organization based on Tacotron 2; (3) an auto-backward Wave Net-based vocoder network. We illustrate that the proposed model can move the information on speaker fluctuation learned by the discriminatively-prepared speaker encoder to the multi speaker TTS task, and can incorporate normal discourse from speakers concealed during preparation. We measure the significance of preparing the speaker encoder on a huge and different speaker set to acquire the best speculation execution. At last, we show that haphazardly inspected speaker embeddings can be utilized to integrate discourse in the voice of novel speakers divergent from those utilized in preparing, showing that the model has taken in a top-notch speaker portrayal.


2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


Author(s):  
Beiming Cao ◽  
Myungjong Kim ◽  
Jan van Santen ◽  
Ted Mau ◽  
Jun Wang

2019 ◽  
Author(s):  
Elshadai Tesfaye Biru ◽  
Yishak Tofik Mohammed ◽  
David Tofu ◽  
Erica Cooper ◽  
Julia Hirschberg

Sign in / Sign up

Export Citation Format

Share Document