ConWST: Non-native Multi-source Knowledge Distillation for Low Resource Speech Translation

Communications in Computer and Information Science - Cognitive Systems and Information Processing ◽

10.1007/978-981-16-9247-5_10 ◽

2022 ◽

pp. 127-141

Author(s):

Wenbo Zhu ◽

Hao Jin ◽

JianWen Chen ◽

Lufeng Luo ◽

Jinhai Wang ◽

...

Keyword(s):

Speech Translation ◽

Low Resource ◽

Knowledge Distillation

Download Full-text

Speech translation for low-resource languages: the case of Pashto

10.21437/interspeech.2005-723 ◽

2005 ◽

Author(s):

Andreas Kathol ◽

Kristin Precoda ◽

Dimitra Vergyri ◽

Wen Wang ◽

Susanne Riehemann

Keyword(s):

Speech Translation ◽

Low Resource

Download Full-text

IMS’ Systems for the IWSLT 2021 Low-Resource Speech Translation Task

10.18653/v1/2021.iwslt-1.21 ◽

2021 ◽

Author(s):

Pavel Denisov ◽

Manuel Mager ◽

Ngoc Thang Vu

Keyword(s):

Speech Translation ◽

Low Resource

Download Full-text

AlloST: Low-Resource Speech Translation Without Source Transcription

10.21437/interspeech.2021-526 ◽

2021 ◽

Author(s):

Yao-Fei Cheng ◽

Hung-Shin Lee ◽

Hsin-Min Wang

Keyword(s):

Speech Translation ◽

Low Resource

Download Full-text

Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation

IEEE Access ◽

10.1109/access.2020.3037821 ◽

2020 ◽

Vol 8 ◽

pp. 206638-206645

Author(s):

Xinlu Zhang ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Low Resource ◽

Knowledge Distillation

Download Full-text

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1661499 ◽

2006 ◽

Cited By ~ 2

Author(s):

S. Narayanan ◽

P.G. Georgiou ◽

A. Sethy ◽

Dagen Wang ◽

M. Bulut ◽

...

Keyword(s):

Speech Recognition ◽

System Design ◽

Translation System ◽

Speech Translation ◽

Low Resource ◽

Speech To Speech Translation

Download Full-text

Knowledge distillation across ensembles of multilingual models for low-resource languages

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7953073 ◽

2017 ◽

Cited By ~ 9

Author(s):

Jia Cui ◽

Brian Kingsbury ◽

Bhuvana Ramabhadran ◽

George Saon ◽

Tom Sercu ◽

...

Keyword(s):

Low Resource ◽

Knowledge Distillation

Download Full-text

Leveraging Multimodal Out-of-Domain Information to Improve Low-Resource Speech Translation

Security and Communication Networks ◽

10.1155/2021/9915130 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Wenbo Zhu ◽

Hao Jin ◽

WeiChang Yeh ◽

Jianwen Chen ◽

Lufeng Luo ◽

...

Keyword(s):

Large Scale ◽

Small Sample ◽

Training Data ◽

Label Free ◽

Speech Translation ◽

Low Resource ◽

Training Samples ◽

Limited Effectiveness ◽

Fine Tune ◽

Domain Information

Speech translation (ST) is a bimodal conversion task from source speech to the target text. Generally, deep learning-based ST systems require sufficient training data to obtain a competitive result, even with a state-of-the-art model. However, the training data is usually unable to meet the completeness condition due to the small sample problems. Most low-resource ST tasks improve data integrity with a single model, but this optimization has a single dimension and limited effectiveness. In contrast, multimodality is introduced to leverage different dimensions of data features for multiperspective modeling. This approach mutually addresses the gaps in the different modalities to enhance the representation of the data and improve the utilization of the training samples. Therefore, it is a new challenge to leverage the enormous multimodal out-of-domain information to improve the low-resource tasks. This paper describes how to use multimodal out-of-domain information to improve low-resource models. First, we propose a low-resource ST framework to reconstruct large-scale label-free audio by combining self-supervised learning. At the same time, we introduce a machine translation (MT) pretraining model to complement text embedding and fine-tune decoding. In addition, we analyze the similarity at the decoder side. We reduce multimodal invalid pseudolabels by performing random depth pruning in the similarity layer to minimize error propagation and use additional CTC loss in the nonsimilarity layer to optimize the ensemble loss. Finally, we study the weighting ratio of the fusion technique in the multimodal decoder. Our experiment results show that the proposed method is promising for low-resource ST, with improvements of up to +3.6 BLEU points compared to baseline low-resource ST models.

Download Full-text