Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Zheying Huang; Pei Wang; Jian Wang; Haoran Miao; Ji Xu; Pengyuan Zhang

doi:10.3390/app11199106

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Applied Sciences ◽

10.3390/app11199106 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9106

Author(s):

Zheying Huang ◽

Pei Wang ◽

Jian Wang ◽

Haoran Miao ◽

Ji Xu ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks ◽

Short Range ◽

Language Identification ◽

Code Switching ◽

Attention Model ◽

Sequential Computation ◽

Attention System ◽

Connectionist Temporal Classification

A Recurrent Neural Networks (RNN) based attention model has been used in code-switching speech recognition (CSSR). However, due to the sequential computation constraint of RNN, there are stronger short-range dependencies and weaker long-range dependencies, which makes it hard to immediately switch languages in CSSR. Firstly, to deal with this problem, we introduce the CTC-Transformer, relying entirely on a self-attention mechanism to draw global dependencies and adopting connectionist temporal classification (CTC) as an auxiliary task for better convergence. Secondly, we proposed two multi-task learning recipes, where a language identification (LID) auxiliary task is learned in addition to the CTC-Transformer automatic speech recognition (ASR) task. Thirdly, we study a decoding strategy to combine the LID into an ASR task. Experiments on the SEAME corpus demonstrate the effects of the proposed methods, achieving a mixed error rate (MER) of 30.95%. It obtains up to 19.35% relative MER reduction compared to the baseline RNN-based CTC-Attention system, and 8.86% relative MER reduction compared to the baseline CTC-Transformer system.

Get full-text (via PubEx)

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Information ◽

10.3390/info12020062 ◽

2021 ◽

Vol 12 (2) ◽

pp. 62 ◽

Cited By ~ 1

Author(s):

Eshete Derb Emiru ◽

Shengwu Xiong ◽

Yaxing Li ◽

Awet Fesseha ◽

Moussa Diallo

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Augmentation ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Attention Model ◽

Recognition Systems ◽

End To End ◽

Connectionist Temporal Classification

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.

Get full-text (via PubEx)

Advanced Recurrent Neural Networks for Automatic Speech Recognition

New Era for Robust Speech Recognition ◽

10.1007/978-3-319-64680-0_11 ◽

2017 ◽

pp. 261-279

Author(s):

Yu Zhang ◽

Dong Yu ◽

Guoguo Chen

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks

Get full-text (via PubEx)

Combining De-noising Auto-encoder and Recurrent Neural Networks in End-to-End Automatic Speech Recognition for Noise Robustness

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639597 ◽

2018 ◽

Author(s):

Tzu-Hsuan Ting ◽

Chia-Ping Chen

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks ◽

Noise Robustness ◽

End To End

Get full-text (via PubEx)

Text Corpus and Acoustic Model Addition for Indonesian-Arabic Code-switching in Automatic Speech Recognition System

2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA) ◽

10.1109/icaicta.2019.8904183 ◽

2019 ◽

Author(s):

Rizky Elzandi Barik ◽

Dessi Puji Lestari

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Code Switching ◽

Speech Recognition System ◽

Acoustic Model ◽

Automatic Speech Recognition System ◽

Text Corpus

Get full-text (via PubEx)

Language identification of individualwords in a multilingual automatic speech recognition system

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4960594 ◽

2009 ◽

Cited By ~ 1

Author(s):

Andrea Hategan ◽

Bogdan Barliga ◽

Ioan Tabus

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Language Identification ◽

Speech Recognition System ◽

Automatic Speech Recognition System

Get full-text (via PubEx)

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9053264 ◽

2020 ◽

Author(s):

A. Kastanos ◽

A. Ragni ◽

M. J. F. Gales

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks ◽

Black Box ◽

Confidence Estimation ◽

Recognition Systems

Get full-text (via PubEx)

Automatic Speech Recognition of Code Switching Speech Using 1-Best Rescoring

2012 International Conference on Asian Language Processing ◽

10.1109/ialp.2012.28 ◽

2012 ◽

Cited By ~ 6

Author(s):

Basem H.A. Ahmed ◽

Tien-Ping Tan

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Code Switching

Get full-text (via PubEx)

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414428 ◽

2021 ◽

Author(s):

Shuai Zhang ◽

Jiangyan Yi ◽

Zhengkun Tian ◽

Ye Bai ◽

Jianhua Tao ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Code Switching ◽

End To End

Get full-text (via PubEx)

Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

2019 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp48816.2019.9037688 ◽

2019 ◽

Author(s):

Chia-Yu Li ◽

Ngoc Thang Vu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Code Switching ◽

End To End

Get full-text (via PubEx)

IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition

Speech Communication ◽

10.1016/j.specom.2019.04.007 ◽

2019 ◽

Vol 110 ◽

pp. 76-89 ◽

Cited By ~ 3

Author(s):

Sreeram Ganji ◽

Kunal Dhawan ◽

Rohit Sinha

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Code Switching

Get full-text (via PubEx)