The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
Recently, with the development and deployment of voicebots which help to minimize personnels at call centers, text-to-speech (TTS) systems supporting English and Chinese have attracted attentions of researchers and corporates worldwide. However, there is very limited published works in TTS developed for Vietnamese. Thus, this paper presents in detail the first Tacotron-2-based TTS application development for Vietnamese that utilizes the publicly available FPT open speech dataset (FOSD) containing approximately 30 hours of labeled audio files together with their transcripts. The dataset was made available by FPT Corporation with an open access license. A new cleaner was developed for supporting Vietnamese language rather than English which was provided by default in Mozilla TTS source code. After 225,000 training steps, the generated speeches have mean opinion score (MOS) well above the average value of 2.50 and center around 3.00 for both clearness and naturalness in a crowd-source survey.