A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms

Author(s):  
Kihyuk Jeong ◽  
Huu-Kim Nguyen ◽  
Hong-Goo Kang
Author(s):  
Edresson Casanova ◽  
Christopher Shulby ◽  
Eren Gölge ◽  
Nicolas Michael Müller ◽  
Frederico Santos de Oliveira ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Chenye Cui ◽  
Yi Ren ◽  
Jinglin Liu ◽  
Feiyang Chen ◽  
Rongjie Huang ◽  
...  
Keyword(s):  

2020 ◽  
Vol 34 (05) ◽  
pp. 8228-8235
Author(s):  
Naihan Li ◽  
Yanqing Liu ◽  
Yu Wu ◽  
Shujie Liu ◽  
Sheng Zhao ◽  
...  

Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.


Author(s):  
Dhruva Mahajan ◽  
◽  
Ashish Gapat ◽  
Lalita Moharkar ◽  
Prathamesh Sawant ◽  
...  

In this paper, we propose an end-to-end text-to-speech system deployment wherein a user feeds input text data which gets synthesized, variated, and altered into artificial voice at the output end. To create a text-to-speech model, that is, a model capable of generating speech with the help of trained datasets. It follows a process which organizes the entire function to present the output sequence in three parts. These three parts are Speaker Encoder, Synthesizer, and Vocoder. Subsequently, using datasets, the model accomplishes generation of voice with prior training and maintains the naturalness of speech throughout. For naturalness of speech we implement a zero-shot adaption technique. The primary capability of the model is to provide the ability of regeneration of voice, which has a variety of applications in the advancement of the domain of speech synthesis. With the help of speaker encoder, our model synthesizes user generated voice if the user wants the output trained on his/her voice which is feeded through the mic, present in GUI. Regeneration capabilities lie within the domain Voice Regeneration which generates similar voice waveforms for any text.


2019 ◽  
Author(s):  
Nishant Prateek ◽  
Mateusz Łajszczak ◽  
Roberto Barra-Chicote ◽  
Thomas Drugman ◽  
Jaime Lorenzo-Trueba ◽  
...  

1982 ◽  
Vol 13 (2) ◽  
pp. 129-133
Author(s):  
A. D. Pellegrini

The paper explores the processes by which children use private speech to regulate their behaviors. The first part of the paper explores the ontological development of self-regulating private speech. The theories of Vygotsky and Luria are used to explain this development. The second part of the paper applies these theories to pedagogical settings. The process by which children are exposed to dialogue strategies that help them solve problems is outlined. The strategy has children posing and answering four questions: What is the problem? How will I solve it? Am I using the plan? How did it work? It is argued that this model helps children systematically mediate their problem solving processes.


Sign in / Sign up

Export Citation Format

Share Document