Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning

Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition

10.21437/interspeech.2017-788 ◽

2017 ◽

Cited By ~ 2

Author(s):

Van Hai Do ◽

Nancy F. Chen ◽

Boon Pang Lim ◽

Mark Hasegawa-Johnson

Keyword(s):

Speech Recognition ◽

Task Learning

Get full-text (via PubEx)

Improved BLSTM RNN Based Accent Speech Recognition Using Multi-task Learning and Accent Embeddings

Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing ◽

10.1145/3388818.3389159 ◽

2020 ◽

Author(s):

Wenbi Rao ◽

Ji Zhang ◽

Jianwei Wu

Keyword(s):

Speech Recognition ◽

Task Learning

Get full-text (via PubEx)

Domain Adversarial Training for Accented Speech Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462663 ◽

2018 ◽

Cited By ~ 16

Author(s):

Sining Sun ◽

Ching-Feng Yeh ◽

Mei-Yuh Hwang ◽

Mari Ostendorf ◽

Lei Xie

Keyword(s):

Speech Recognition ◽

Accented Speech ◽

Adversarial Training

Get full-text (via PubEx)

To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8682468 ◽

2019 ◽

Cited By ~ 3

Author(s):

Yossi Adi ◽

Neil Zeghidour ◽

Ronan Collobert ◽

Nicolas Usunier ◽

Vitaliy Liptchinsky ◽

...

Keyword(s):

Speech Recognition ◽

Empirical Comparison ◽

Task Learning

Get full-text (via PubEx)

Speaker-aware long short-term memory multi-task learning for speech recognition

2016 24th European Signal Processing Conference (EUSIPCO) ◽

10.1109/eusipco.2016.7760581 ◽

2016 ◽

Cited By ~ 7

Author(s):

Gueorgui Pironkov ◽

Stephane Dupont ◽

Thierry Dutoit

Keyword(s):

Speech Recognition ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Task Learning ◽

Long Short Term Memory

Get full-text (via PubEx)

Learning Fast Adaptation on Cross-Accented Speech Recognition

10.21437/interspeech.2020-45 ◽

2020 ◽

Author(s):

Genta Indra Winata ◽

Samuel Cahyawijaya ◽

Zihan Liu ◽

Zhaojiang Lin ◽

Andrea Madotto ◽

...

Keyword(s):

Speech Recognition ◽

Accented Speech ◽

Fast Adaptation

Get full-text (via PubEx)

Leveraging Native Language Information for Improved Accented Speech Recognition

10.21437/interspeech.2018-1378 ◽

2018 ◽

Cited By ~ 4

Author(s):

Shahram Ghorbani ◽

John H.L. Hansen

Keyword(s):

Speech Recognition ◽

Native Language ◽

Accented Speech

Get full-text (via PubEx)

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

10.21437/interspeech.2021-1888 ◽

2021 ◽

Author(s):

Nilaksh Das ◽

Sravan Bodapati ◽

Monica Sunkara ◽

Sundararajan Srinivasan ◽

Duen Horng Chau

Keyword(s):

Speech Recognition ◽

Transfer Learning ◽

Accented Speech

Get full-text (via PubEx)

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Applied Sciences ◽

10.3390/app11188412 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8412

Author(s):

Hyeong-Ju Na ◽

Jeong-Sik Park

Keyword(s):

Speech Recognition ◽

Domain Adaptation ◽

Training Data ◽

Baseline Model ◽

Linguistic Differences ◽

Computational Costs ◽

Accented Speech ◽

Feature Extractor ◽

Adversarial Training ◽

End To End

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accent embedding method, in which the accent embedding features are directly fed into the ASR network. Although the method improves the performance of accented speech recognition, it has some restrictions, such as increasing the computational costs. This study proposes an efficient method of training the ASR model for accented speech in a domain adversarial way based on the Domain Adversarial Neural Network (DANN). The DANN plays a role as a domain adaptation in which the training data and test data have different distributions. Thus, our approach is expected to construct a reliable ASR model for accented speech by reducing the distribution differences between accented speech and standard speech. DANN has three sub-networks: the feature extractor, the domain classifier, and the label predictor. To adjust the DANN for accented speech recognition, we constructed these three sub-networks independently, considering the characteristics of accented speech. In particular, we used an end-to-end framework based on Connectionist Temporal Classification (CTC) to develop the label predictor, a very important module that directly affects ASR results. To verify the efficiency of the proposed approach, we conducted several experiments of accented speech recognition for four English accents including Australian, Canadian, British (England), and Indian accents. The experimental results showed that the proposed DANN-based model outperformed the baseline model for all accents, indicating that the end-to-end domain adversarial training effectively reduced the distribution differences between accented speech and standard speech.

Get full-text (via PubEx)

End-to-End Accented Speech Recognition

10.21437/interspeech.2019-2122 ◽

2019 ◽

Cited By ~ 2

Author(s):

Thibault Viglino ◽

Petr Motlicek ◽

Milos Cernak

Keyword(s):

Speech Recognition ◽

Accented Speech ◽

End To End

Get full-text (via PubEx)