An Analysis of Decoding for Attention-Based End-to-End Mandarin Speech Recognition

Author(s):  
Dongwei Jiang ◽  
Wei Zou ◽  
Shuaijiang Zhao ◽  
Guilin Yang ◽  
Xiangang Li
2021 ◽  
Author(s):  
Jianwei Sun ◽  
Zhiyuan Tang ◽  
Hengxin Yin ◽  
Wei Wang ◽  
Xi Zhao ◽  
...  

Symmetry ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 644 ◽  
Author(s):  
Dong Wang ◽  
Xiaodong Wang ◽  
Shaohe Lv

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. This is especially common for Mandarin speech recognition. In this paper, we propose a CNN+BLSTM+CTC end-to-end Mandarin ASR. This CNN+BLSTM+CTC ASR uses Convolutional Neural Net (CNN) to learn local speech features, uses Bidirectional Long-Short Time Memory (BLSTM) to learn history and future contextual information, and uses Connectionist Temporal Classification (CTC) for decoding. Our model is completely trained on the by-far-largest open-source Mandarin speech corpus AISHELL-1, using neither any in-house databases nor external language models. Experiments show that our CNN+BLSTM+CTC model achieves a WER of 19.2%, outperforming the exiting best work. Because all the data corpora we used are freely available, our model is reproducible and comparable, providing a new baseline for further Mandarin ASR research.


2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

2020 ◽  
Author(s):  
Jeremy H.M. Wong ◽  
Yashesh Gaur ◽  
Rui Zhao ◽  
Liang Lu ◽  
Eric Sun ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document