End-to-end Speech Recognition With Word-Based Rnn Language Models

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639693 ◽

2018 ◽

Author(s):

Takaaki Hori ◽

Jaejin Cho ◽

Shinji Watanabe

Keyword(s):

Speech Recognition ◽

Language Models ◽

Download Full-text

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683573 ◽

2019 ◽

Author(s):

Zhehuai Chen ◽

Mahaveer Jain ◽

Yongqiang Wang ◽

Michael L. Seltzer ◽

Christian Fuegen

Keyword(s):

Speech Recognition ◽

Language Models ◽

Token Passing ◽

Download Full-text

Location-Based End-to-End Speech Recognition with Multiple Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019975 ◽

2019 ◽

Vol 33 ◽

pp. 9975-9976

Author(s):

Zhijie Lin ◽

Kaiyang Lin ◽

Shiling Chen ◽

Linlin Li ◽

Zhou Zhao

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Error Correction ◽

Automatic Speech Recognition ◽

Language Model ◽

Language Models ◽

Learning Approaches ◽

Semantic Error ◽

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.

Download Full-text

End-to-End Mandarin Speech Recognition Combining CNN and BLSTM

Symmetry ◽

10.3390/sym11050644 ◽

2019 ◽

Vol 11 (5) ◽

pp. 644 ◽

Author(s):

Dong Wang ◽

Xiaodong Wang ◽

Shaohe Lv

Keyword(s):

Speech Recognition ◽

Contextual Information ◽

Language Models ◽

Speech Corpus ◽

Speech Features ◽

Mandarin Speech Recognition ◽

Conventional Systems ◽

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. This is especially common for Mandarin speech recognition. In this paper, we propose a CNN+BLSTM+CTC end-to-end Mandarin ASR. This CNN+BLSTM+CTC ASR uses Convolutional Neural Net (CNN) to learn local speech features, uses Bidirectional Long-Short Time Memory (BLSTM) to learn history and future contextual information, and uses Connectionist Temporal Classification (CTC) for decoding. Our model is completely trained on the by-far-largest open-source Mandarin speech corpus AISHELL-1, using neither any in-house databases nor external language models. Experiments show that our CNN+BLSTM+CTC model achieves a WER of 19.2%, outperforming the exiting best work. Because all the data corpora we used are freely available, our model is reproducible and comparable, providing a new baseline for further Mandarin ASR research.

Download Full-text

Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287836 ◽

2021 ◽

Author(s):

Cong-Thanh Do ◽

Shucong Zhang ◽

Thomas Hain

Keyword(s):

Speech Recognition ◽

Selective Adaptation ◽

Noise Robustness ◽

Download Full-text

Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition

10.21437/interspeech.2017-717 ◽

2017 ◽

Author(s):

Shohei Toyama ◽

Daisuke Saito ◽

Nobuaki Minematsu

Keyword(s):

Speech Recognition ◽

Contextual Factors ◽

Spontaneous Speech ◽

Language Models ◽

Acoustic Features

Download Full-text

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

10.21437/interspeech.2018-1030 ◽

2018 ◽

Author(s):

Chao Weng ◽

Jia Cui ◽

Guangsen Wang ◽

Jun Wang ◽

Chengzhu Yu ◽

...

Keyword(s):

Speech Recognition ◽

Conversational Speech ◽

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Download Full-text

Combination of End-to-End and Hybrid Models for Speech Recognition

10.21437/interspeech.2020-2141 ◽

2020 ◽

Author(s):

Jeremy H.M. Wong ◽

Yashesh Gaur ◽

Rui Zhao ◽

Liang Lu ◽

Eric Sun ◽

...

Keyword(s):

Speech Recognition ◽

Hybrid Models ◽

Download Full-text

Active Learning Methods for Low Resource End-to-End Speech Recognition

10.21437/interspeech.2019-2316 ◽

2019 ◽

Author(s):

Karan Malhotra ◽

Shubham Bansal ◽

Sriram Ganapathy

Keyword(s):

Speech Recognition ◽

Active Learning ◽

Learning Methods ◽

Low Resource ◽

Download Full-text

Large Margin Training for Attention Based End-to-End Speech Recognition

10.21437/interspeech.2019-1680 ◽

2019 ◽

Author(s):

Peidong Wang ◽

Jia Cui ◽

Chao Weng ◽

Dong Yu

Keyword(s):

Speech Recognition ◽

Large Margin ◽

Download Full-text