scholarly journals End-to-end Speech Recognition With Word-Based Rnn Language Models

Author(s):  
Takaaki Hori ◽  
Jaejin Cho ◽  
Shinji Watanabe
Author(s):  
Zhijie Lin ◽  
Kaiyang Lin ◽  
Shiling Chen ◽  
Linlin Li ◽  
Zhou Zhao

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.


Symmetry ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 644 ◽  
Author(s):  
Dong Wang ◽  
Xiaodong Wang ◽  
Shaohe Lv

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. This is especially common for Mandarin speech recognition. In this paper, we propose a CNN+BLSTM+CTC end-to-end Mandarin ASR. This CNN+BLSTM+CTC ASR uses Convolutional Neural Net (CNN) to learn local speech features, uses Bidirectional Long-Short Time Memory (BLSTM) to learn history and future contextual information, and uses Connectionist Temporal Classification (CTC) for decoding. Our model is completely trained on the by-far-largest open-source Mandarin speech corpus AISHELL-1, using neither any in-house databases nor external language models. Experiments show that our CNN+BLSTM+CTC model achieves a WER of 19.2%, outperforming the exiting best work. Because all the data corpora we used are freely available, our model is reproducible and comparable, providing a new baseline for further Mandarin ASR research.


2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

2020 ◽  
Author(s):  
Jeremy H.M. Wong ◽  
Yashesh Gaur ◽  
Rui Zhao ◽  
Liang Lu ◽  
Eric Sun ◽  
...  

2019 ◽  
Author(s):  
Peidong Wang ◽  
Jia Cui ◽  
Chao Weng ◽  
Dong Yu

Sign in / Sign up

Export Citation Format

Share Document