scholarly journals A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition

Author(s):  
Erik McDermott ◽  
Hasim Sak ◽  
Ehsan Variani
2021 ◽  
Author(s):  
Zhong Meng ◽  
Yu Wu ◽  
Naoyuki Kanda ◽  
Liang Lu ◽  
Xie Chen ◽  
...  

Author(s):  
Zhijie Lin ◽  
Kaiyang Lin ◽  
Shiling Chen ◽  
Linlin Li ◽  
Zhou Zhao

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.


2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

Author(s):  
Zhong Meng ◽  
Sarangarajan Parthasarathy ◽  
Eric Sun ◽  
Yashesh Gaur ◽  
Naoyuki Kanda ◽  
...  

2021 ◽  
Vol 336 ◽  
pp. 06016
Author(s):  
Taiben Suan ◽  
Rangzhuoma Cai ◽  
Zhijie Cai ◽  
Ba Zu ◽  
Baojia Gong

We built a language model which is based on Transformer network architecture, used attention mechanisms to dispensing with recurrence and convalutions entirely. Through the transliteration of Tibetan to International Phonetic Alphabets, the language model was trained using the syllables and phonemes of the Tibetan word as modeling units to predict corresponding Tibetan sentences according to the context semantics of IPA. And it combined with the acoustic model as the Tibetan speech recognition was compared with end-to-end Tibetan speech recognition.


Author(s):  
Prashanth Gurunath Shivakumar ◽  
Haoqi Li ◽  
Kevin Knight ◽  
Panayiotis Georgiou

AbstractAutomatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.


Sign in / Sign up

Export Citation Format

Share Document