On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition

Author(s):  
Seyedmahdad Mirsamadi ◽  
John H.L. Hansen
Author(s):  
Danny Henry Galatang ◽  
◽  
Suyanto Suyanto ◽  

The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words shows that the proposed MSASR produces much higher word accuracy (76.57%) than the monophone-based one (63.36%). Its performance is comparable to the character-based MDS-E2EASR, which produces 76.90%, and the character-based KST-E2EASR (78.00%). In the future, this monosyllable-based ASR is possible to be improved to the bisyllable-based one to give higher word accuracy. Nevertheless, extensive bisyllable acoustic models must be handled using an advanced method.


2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

2020 ◽  
Author(s):  
Jeremy H.M. Wong ◽  
Yashesh Gaur ◽  
Rui Zhao ◽  
Liang Lu ◽  
Eric Sun ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document