Cross-lingual language modeling for low-resource speech recognition

Mapping Intimacies ◽

10.14711/thesis-b1180211 ◽

2012 ◽

Author(s):

Ping Xu

Keyword(s):

Speech Recognition ◽

Language Modeling ◽

Low Resource ◽

Download Full-text

Cross-Lingual Language Modeling for Low-Resource Speech Recognition

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2244088 ◽

2013 ◽

Vol 21 (6) ◽

pp. 1134-1144 ◽

Author(s):

Ping Xu ◽

P. Fung

Keyword(s):

Speech Recognition ◽

Language Modeling ◽

Low Resource ◽

Download Full-text

Cross-lingual and ensemble MLPs strategies for low-resource speech recognition

10.21437/interspeech.2012-11 ◽

2012 ◽

Author(s):

Yanmin Qian ◽

Jia Liu

Keyword(s):

Speech Recognition ◽

Low Resource ◽

Download Full-text

Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2281575 ◽

2014 ◽

Vol 22 (1) ◽

pp. 17-27 ◽

Author(s):

Liang Lu ◽

Arnab Ghoshal ◽

Steve Renals

Keyword(s):

Speech Recognition ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Low Resource ◽

Download Full-text

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7953256 ◽

2017 ◽

Author(s):

Kazuki Irie ◽

Pavel Golik ◽

Ralf Schluter ◽

Hermann Ney

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Convolutional Neural Networks ◽

Language Modeling ◽

Download Full-text

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

10.18653/v1/e17-1088 ◽

2017 ◽

Author(s):

Oliver Adams ◽

Adam Makarucha ◽

Graham Neubig ◽

Steven Bird ◽

Trevor Cohn

Keyword(s):

Language Modeling ◽

Word Embeddings ◽

Low Resource ◽

Download Full-text

A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition

10.21437/interspeech.2015-481 ◽

2015 ◽

Author(s):

Haihua Xu ◽

Van Hai Do ◽

Xiong Xiao ◽

Eng Siong Chng

Keyword(s):

Speech Recognition ◽

Comparative Study ◽

Low Resource ◽

Download Full-text

Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Applied Sciences ◽

10.3390/app11051974 ◽

2021 ◽

Vol 11 (5) ◽

pp. 1974 ◽

Author(s):

Chanhee Lee ◽

Kisu Yang ◽

Taesun Whang ◽

Chanjun Park ◽

Andrew Matteson ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Language Modeling ◽

Language Models ◽

Low Resource ◽

High Resource ◽

Cross Lingual ◽

Data Efficiency

Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved by formulating language modeling of low-resource languages as a domain adaptation task using transformer-based language models pretrained on corpora of high-resource languages. Our novel cross-lingual post-training approach selectively reuses parameters of the language model trained on a high-resource language and post-trains them while learning language-specific parameters in the low-resource language. We also propose implicit translation layers that can learn linguistic differences between languages at a sequence level. To evaluate our method, we post-train a RoBERTa model pretrained in English and conduct a case study for the Korean language. Quantitative results from intrinsic and extrinsic evaluations show that our method outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared to monolingual training.

Download Full-text

Subspace mixture model for low-resource speech recognition in cross-lingual settings

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6639088 ◽

2013 ◽

Author(s):

Yajie Miao ◽

Florian Metze ◽

Alex Waibel

Keyword(s):

Speech Recognition ◽

Mixture Model ◽

Low Resource ◽

Download Full-text

Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2021.3138674 ◽

2021 ◽

pp. 1-1

Author(s):

Wenxin Hou ◽

Han Zhu ◽

Yidong Wang ◽

Jindong Wang ◽

Tao Qin ◽

...

Keyword(s):

Speech Recognition ◽

Low Resource ◽

Download Full-text

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7953171 ◽

2017 ◽

Author(s):

Ali Raza Syed ◽

Andrew Rosenberg ◽

Michael Mandel

Keyword(s):

Speech Recognition ◽

Active Learning ◽

Language Modeling ◽

Low Resource ◽

Download Full-text