Unsupervised Learning for Sequence-to-Sequence Text-to-Speech for Low-Resource Languages

Mapping Intimacies ◽

10.21437/interspeech.2020-1403 ◽

2020 ◽

Author(s):

Haitong Zhang ◽

Yue Lin

Keyword(s):

Unsupervised Learning ◽

Text To Speech ◽

Download Full-text

Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator

10.21437/sltu.2018-16 ◽

2018 ◽

Author(s):

Monica Mundada ◽

Sangramsing Kayte ◽

Pradip Das

Keyword(s):

Text To Speech ◽

Download Full-text

Post-Processing Using Speech Enhancement Techniques for Unit Selection and Hidden Markov Model Based Low Resource Language Marathi Text-to-Speech System

10.21437/sltu.2018-20 ◽

2018 ◽

Author(s):

Sangramsing Kayte ◽

Monica Mundada

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Speech Enhancement ◽

Hidden Markov ◽

Post Processing ◽

Text To Speech ◽

Unit Selection ◽

Low Resource ◽

Download Full-text

A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages

10.21437/interspeech.2020-3180 ◽

2020 ◽

Author(s):

Mano Ranjith Kumar M. ◽

Sudhanshu Srivastava ◽

Anusha Prakash ◽

Hema A. Murthy

Keyword(s):

Histogram Equalization ◽

Indian Languages ◽

Text To Speech ◽

Speech Synthesizer ◽

Download Full-text

Text-to-speech for low-resource systems

2002 IEEE Workshop on Multimedia Signal Processing. ◽

10.1109/mmsp.2002.1203295 ◽

2004 ◽

Author(s):

M. Schnell ◽

M. Kustner ◽

O. Jokisch ◽

R. Hoffmann

Keyword(s):

Text To Speech ◽

Low Resource ◽

Resource Systems

Download Full-text

Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping

American Journal of Artificial Intelligence ◽

10.11648/j.ajai.20200402.11 ◽

2020 ◽

Vol 4 (2) ◽

pp. 42

Author(s):

Tianyi Xu ◽

Ozge Ilkim Ozbek ◽

Shannon Marks ◽

Sri Korrapati ◽

Benyamin Ahmadnia

Keyword(s):

Unsupervised Learning ◽

Machine Translation ◽

Download Full-text

Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00225-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Zolzaya Byambadorj ◽

Ryota Nishimura ◽

Altangerel Ayush ◽

Kengo Ohta ◽

Norihide Kitaoka

Keyword(s):

Transfer Learning ◽

Data Augmentation ◽

Prediction Models ◽

Target Language ◽

Text To Speech ◽

Paired Data ◽

Low Resource ◽

Language Data ◽

Single Speaker ◽

AbstractDeep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech paired data for model training, and collecting this data is costly. Therefore, in this paper, we propose a single-speaker TTS system containing both a spectrogram prediction network and a neural vocoder for the target language, using only 30 min of target language text-speech paired data for training. We evaluate three approaches for training the spectrogram prediction models of our TTS system, which produce mel-spectrograms from the input phoneme sequence: (1) cross-lingual transfer learning, (2) data augmentation, and (3) a combination of the previous two methods. In the cross-lingual transfer learning method, we used two high-resource language datasets, English (24 h) and Japanese (10 h). We also used 30 min of target language data for training in all three approaches, and for generating the augmented data used for training in methods 2 and 3. We found that using both cross-lingual transfer learning and augmented data during training resulted in the most natural synthesized target speech output. We also compare single-speaker and multi-speaker training methods, using sequential and simultaneous training, respectively. The multi-speaker models were found to be more effective for constructing a single-speaker, low-resource TTS model. In addition, we trained two Parallel WaveGAN (PWG) neural vocoders, one using 13 h of our augmented data with 30 min of target language data and one using the entire 12 h of the original target language dataset. Our subjective AB preference test indicated that the neural vocoder trained with augmented data achieved almost the same perceived speech quality as the vocoder trained with the entire target language dataset. Overall, we found that our proposed TTS system consisting of a spectrogram prediction network and a PWG neural vocoder was able to achieve reasonable performance using only 30 min of target language training data. We also found that by using 3 h of target language data, for training the model and for generating augmented data, our proposed TTS model was able to achieve performance very similar to that of the baseline model, which was trained with 12 h of target language data.

Download Full-text

Low-Resource Expressive Text-To-Speech Using Data Augmentation

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413466 ◽

2021 ◽

Author(s):

Goeric Huybrechts ◽

Thomas Merritt ◽

Giulia Comini ◽

Bartek Perz ◽

Raahil Shah ◽

...

Keyword(s):

Data Augmentation ◽

Text To Speech ◽

Low Resource ◽

Download Full-text

A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages

10.21437/interspeech.2021-1565 ◽

2021 ◽

Author(s):

Phat Do ◽

Matt Coler ◽

Jelske Dijkstra ◽

Esther Klabbers

Keyword(s):

Systematic Review ◽

Text To Speech ◽

Low Resource ◽

Multilingual Data

Download Full-text

Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages

IEEE Access ◽

10.1109/access.2022.3141200 ◽

2022 ◽

pp. 1-1

Author(s):

Kurniawati Azizah ◽

Wisnu Jatmiko

Keyword(s):

Learning Style ◽

Transfer Learning ◽

Text To Speech ◽

Download Full-text

End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning

10.21437/interspeech.2019-2730 ◽

2019 ◽

Author(s):

Yuan-Jui Chen ◽

Tao Tu ◽

Cheng-chieh Yeh ◽

Hung-Yi Lee

Keyword(s):

Transfer Learning ◽

Text To Speech ◽

Low Resource ◽

Download Full-text