On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition

Mapping Intimacies ◽

10.21437/interspeech.2017-398 ◽

2017 ◽

Author(s):

Seyedmahdad Mirsamadi ◽

John H.L. Hansen

Keyword(s):

Speech Recognition ◽

Acoustic Models ◽

Download Full-text

End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow

10.21437/interspeech.2017-1284 ◽

2017 ◽

Author(s):

Ehsan Variani ◽

Tom Bagby ◽

Erik McDermott ◽

Michiel Bacchiani

Keyword(s):

Speech Recognition ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Large Vocabulary ◽

Acoustic Models ◽

Download Full-text

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8682972 ◽

2019 ◽

Author(s):

Jiangyan Yi ◽

Jianhua Tao ◽

Ye Bai

Keyword(s):

Speech Recognition ◽

Acoustic Models ◽

Low Resource ◽

Download Full-text

Syllable-Based Indonesian Automatic Speech Recognition

International Journal on Electrical Engineering and Informatics ◽

10.15676/ijeei.2020.12.4.2 ◽

2020 ◽

Vol 12 (4) ◽

pp. 720-728

Author(s):

Danny Henry Galatang ◽

◽

Suyanto Suyanto ◽

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

The State ◽

Speech Corpus ◽

Advanced Method ◽

Acoustic Models ◽

The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words shows that the proposed MSASR produces much higher word accuracy (76.57%) than the monophone-based one (63.36%). Its performance is comparable to the character-based MDS-E2EASR, which produces 76.90%, and the character-based KST-E2EASR (78.00%). In the future, this monosyllable-based ASR is possible to be improved to the bisyllable-based one to give higher word accuracy. Nevertheless, extensive bisyllable acoustic models must be handled using an advanced method.

Download Full-text

Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287836 ◽

2021 ◽

Author(s):

Cong-Thanh Do ◽

Shucong Zhang ◽

Thomas Hain

Keyword(s):

Speech Recognition ◽

Selective Adaptation ◽

Noise Robustness ◽

Download Full-text

Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models

10.21437/interspeech.2018-1371 ◽

2018 ◽

Author(s):

Yuan Gong ◽

Christian Poellabauer

Keyword(s):

Acoustic Models ◽

Deep Cnn ◽

Download Full-text

CTC Training of Multi-Phone Acoustic Models for Speech Recognition

10.21437/interspeech.2017-505 ◽

2017 ◽

Author(s):

Olivier Siohan

Keyword(s):

Speech Recognition ◽

Acoustic Models

Download Full-text

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

10.21437/interspeech.2018-1030 ◽

2018 ◽

Author(s):

Chao Weng ◽

Jia Cui ◽

Guangsen Wang ◽

Jun Wang ◽

Chengzhu Yu ◽

...

Keyword(s):

Speech Recognition ◽

Conversational Speech ◽

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Download Full-text

Combination of End-to-End and Hybrid Models for Speech Recognition

10.21437/interspeech.2020-2141 ◽

2020 ◽

Author(s):

Jeremy H.M. Wong ◽

Yashesh Gaur ◽

Rui Zhao ◽

Liang Lu ◽

Eric Sun ◽

...

Keyword(s):

Speech Recognition ◽

Hybrid Models ◽

Download Full-text

Active Learning Methods for Low Resource End-to-End Speech Recognition

10.21437/interspeech.2019-2316 ◽

2019 ◽

Author(s):

Karan Malhotra ◽

Shubham Bansal ◽

Sriram Ganapathy

Keyword(s):

Speech Recognition ◽

Active Learning ◽

Learning Methods ◽

Low Resource ◽

Download Full-text