A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

End To End

Download Full-text

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383515 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Sarangarajan Parthasarathy ◽

Eric Sun ◽

Yashesh Gaur ◽

Naoyuki Kanda ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Estimation ◽

End To End

Download Full-text

Low-Complexity DNN-Based End-to-End Automatic Speech Recognition using Low-Rank Approximation

2020 International SoC Design Conference (ISOCC) ◽

10.1109/isocc50952.2020.9332970 ◽

2020 ◽

Author(s):

Jongmin Park ◽

Youngjoo Lee

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Low Complexity ◽

Low Rank ◽

Low Rank Approximation ◽

Rank Approximation ◽

End To End

Download Full-text

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683602 ◽

2019 ◽

Cited By ~ 5

Author(s):

Alexander H. Liu ◽

Hung-yi Lee ◽

Lin-shan Lee

Keyword(s):

Speech Recognition ◽

Language Model ◽

Adversarial Training ◽

End To End

Download Full-text

Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

The Journal of the Acoustical Society of America ◽

10.1121/1.1624065 ◽

2003 ◽

Vol 114 (6) ◽

pp. 3032-3035 ◽

Cited By ~ 9

Author(s):

Odette Scharenborg ◽

Louis ten Bosch ◽

Lou Boves ◽

Dennis Norris

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Human Speech ◽

End To End

Download Full-text

A language model for Amdo Tibetan speech recognition

MATEC Web of Conferences ◽

10.1051/matecconf/202133606016 ◽

2021 ◽

Vol 336 ◽

pp. 06016

Author(s):

Taiben Suan ◽

Rangzhuoma Cai ◽

Zhijie Cai ◽

Ba Zu ◽

Baojia Gong

Keyword(s):

Speech Recognition ◽

Network Architecture ◽

Language Model ◽

Acoustic Model ◽

End To End

We built a language model which is based on Transformer network architecture, used attention mechanisms to dispensing with recurrence and convalutions entirely. Through the transliteration of Tibetan to International Phonetic Alphabets, the language model was trained using the syllables and phonemes of the Tibetan word as modeling units to predict corresponding Tibetan sentences according to the context semantics of IPA. And it combined with the acoustic model as the Tibetan speech recognition was compared with end-to-end Tibetan speech recognition.

Download Full-text

Combining De-noising Auto-encoder and Recurrent Neural Networks in End-to-End Automatic Speech Recognition for Noise Robustness

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639597 ◽

2018 ◽

Author(s):

Tzu-Hsuan Ting ◽

Chia-Ping Chen

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks ◽

Noise Robustness ◽

End To End

Download Full-text

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.31 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 5

Author(s):

Prashanth Gurunath Shivakumar ◽

Haoqi Li ◽

Kevin Knight ◽

Panayiotis Georgiou

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Context Modeling ◽

Short Term ◽

Extensive Analysis ◽

Network Language ◽

Correction System

AbstractAutomatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.

Download Full-text