An Analysis of Decoding for Attention-Based End-to-End Mandarin Speech Recognition

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2018.8706686 ◽

2018 ◽

Author(s):

Dongwei Jiang ◽

Wei Zou ◽

Shuaijiang Zhao ◽

Guilin Yang ◽

Xiangang Li

Keyword(s):

Speech Recognition ◽

Mandarin Speech Recognition

Download Full-text

Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

10.21437/interspeech.2021-1162 ◽

2021 ◽

Author(s):

Jianwei Sun ◽

Zhiyuan Tang ◽

Hengxin Yin ◽

Wei Wang ◽

Xi Zhao ◽

...

Keyword(s):

Speech Recognition ◽

Data Augmentation ◽

Semantic Data ◽

Mandarin Speech Recognition

Download Full-text

End-to-End Mandarin Speech Recognition Using Bidirectional Long Short-Term Memory Network

Advances in Intelligent Systems and Computing - Recent Developments in Mechatronics and Intelligent Robotics ◽

10.1007/978-3-030-00214-5_91 ◽

2018 ◽

pp. 726-735

Author(s):

Yu Yao ◽

Ryad Chellali

Keyword(s):

Speech Recognition ◽

Short Term Memory ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory ◽

Mandarin Speech Recognition

Download Full-text

End-to-End Mandarin Speech Recognition Combining CNN and BLSTM

Symmetry ◽

10.3390/sym11050644 ◽

2019 ◽

Vol 11 (5) ◽

pp. 644 ◽

Author(s):

Dong Wang ◽

Xiaodong Wang ◽

Shaohe Lv

Keyword(s):

Speech Recognition ◽

Contextual Information ◽

Language Models ◽

Speech Corpus ◽

Speech Features ◽

Mandarin Speech Recognition ◽

Conventional Systems ◽

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. This is especially common for Mandarin speech recognition. In this paper, we propose a CNN+BLSTM+CTC end-to-end Mandarin ASR. This CNN+BLSTM+CTC ASR uses Convolutional Neural Net (CNN) to learn local speech features, uses Bidirectional Long-Short Time Memory (BLSTM) to learn history and future contextual information, and uses Connectionist Temporal Classification (CTC) for decoding. Our model is completely trained on the by-far-largest open-source Mandarin speech corpus AISHELL-1, using neither any in-house databases nor external language models. Experiments show that our CNN+BLSTM+CTC model achieves a WER of 19.2%, outperforming the exiting best work. Because all the data corpora we used are freely available, our model is reproducible and comparable, providing a new baseline for further Mandarin ASR research.

Download Full-text

Comparable Study Of Modeling Units For End-To-End Mandarin Speech Recognition

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2018.8706661 ◽

2018 ◽

Author(s):

Wei Zou ◽

Dongwei Jiang ◽

Shuaijiang Zhao ◽

Guilin Yang ◽

Xiangang Li

Keyword(s):

Speech Recognition ◽

Mandarin Speech Recognition

Download Full-text

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition

10.21437/interspeech.2019-1290 ◽

2019 ◽

Author(s):

Shiliang Zhang ◽

Ming Lei ◽

Zhijie Yan

Keyword(s):

Speech Recognition ◽

Spelling Correction ◽

Correction Model ◽

Mandarin Speech Recognition

Download Full-text

Measuring Mandarin Speech Recognition Thresholds Using the Method of Adaptive Tracking

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-h-18-0162 ◽

2019 ◽

Vol 62 (6) ◽

pp. 2009-2017

Author(s):

Yuxia Wang ◽

Zhaoyu Lu ◽

Xiaohu Yang ◽

Chang Liu

Keyword(s):

Speech Recognition ◽

Adaptive Tracking ◽

Mandarin Speech Recognition

Download Full-text

Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287836 ◽

2021 ◽

Author(s):

Cong-Thanh Do ◽

Shucong Zhang ◽

Thomas Hain

Keyword(s):

Speech Recognition ◽

Selective Adaptation ◽

Noise Robustness ◽

Download Full-text

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

10.21437/interspeech.2018-1030 ◽

2018 ◽

Author(s):

Chao Weng ◽

Jia Cui ◽

Guangsen Wang ◽

Jun Wang ◽

Chengzhu Yu ◽

...

Keyword(s):

Speech Recognition ◽

Conversational Speech ◽

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Download Full-text

Combination of End-to-End and Hybrid Models for Speech Recognition

10.21437/interspeech.2020-2141 ◽

2020 ◽

Author(s):

Jeremy H.M. Wong ◽

Yashesh Gaur ◽

Rui Zhao ◽

Liang Lu ◽

Eric Sun ◽

...

Keyword(s):

Speech Recognition ◽

Hybrid Models ◽

Download Full-text