Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Aditya Siddhant; Melvin Johnson; Henry Tsai; Naveen Ari; Jason Riesa; Ankur Bapna; Orhan Firat; Karthik Raman

doi:10.1609/aaai.v34i05.6414

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6414 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8854-8861 ◽

Cited By ~ 1

Author(s):

Aditya Siddhant ◽

Melvin Johnson ◽

Henry Tsai ◽

Naveen Ari ◽

Jason Riesa ◽

...

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Single Model ◽

Neural Machine Translation ◽

Low Resource ◽

Sequence Labeling ◽

Transfer Capability ◽

Learning Scenarios ◽

The Cross ◽

Cross Lingual

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model (Aharoni, Johnson, and Firat 2019). Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT) (Devlin et al. 2018), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.

Download Full-text

Machine Translation in Low-Resource Languages by an Adversarial Neural Network

Applied Sciences ◽

10.3390/app112210860 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10860

Author(s):

Mengtao Sun ◽

Hao Wang ◽

Mark Pasquine ◽

Ibrahim A. Hameed

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Grammatical Structure ◽

Neural Machine Translation ◽

Low Resource ◽

High Resource ◽

Learning Techniques ◽

Good Potential ◽

Target Languages ◽

Cross Lingual

Existing Sequence-to-Sequence (Seq2Seq) Neural Machine Translation (NMT) shows strong capability with High-Resource Languages (HRLs). However, this approach poses serious challenges when processing Low-Resource Languages (LRLs), because the model expression is limited by the training scale of parallel sentence pairs. This study utilizes adversary and transfer learning techniques to mitigate the lack of sentence pairs in LRL corpora. We propose a new Low resource, Adversarial, Cross-lingual (LAC) model for NMT. In terms of the adversary technique, LAC model consists of a generator and discriminator. The generator is a Seq2Seq model that produces the translations from source to target languages, while the discriminator measures the gap between machine and human translations. In addition, we introduce transfer learning on LAC model to help capture the features in rare resources because some languages share the same subject-verb-object grammatical structure. Rather than using the entire pretrained LAC model, we separately utilize the pretrained generator and discriminator. The pretrained discriminator exhibited better performance in all experiments. Experimental results demonstrate that the LAC model achieves higher Bilingual Evaluation Understudy (BLEU) scores and has good potential to augment LRL translations.

Download Full-text

A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation

Mathematical Problems in Engineering ◽

10.1155/2020/6140153 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Gong-Xu Luo ◽

Ya-Ting Yang ◽

Rui Dong ◽

Yan-Hong Chen ◽

Wen-Bo Zhang

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Large Scale ◽

Data Augmentation ◽

Training Methods ◽

Learning Method ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

Back Translation

Neural machine translation (NMT) for low-resource languages has drawn great attention in recent years. In this paper, we propose a joint back-translation and transfer learning method for low-resource languages. It is widely recognized that data augmentation methods and transfer learning methods are both straight forward and effective ways for low-resource problems. However, existing methods, which utilize one of these methods alone, limit the capacity of NMT models for low-resource problems. In order to make full use of the advantages of existing methods and further improve the translation performance of low-resource languages, we propose a new method to perfectly integrate the back-translation method with mainstream transfer learning architectures, which can not only initialize the NMT model by transferring parameters of the pretrained models, but also generate synthetic parallel data by translating large-scale monolingual data of the target side to boost the fluency of translations. We conduct experiments to explore the effectiveness of the joint method by incorporating back-translation into the parent-child and the hierarchical transfer learning architecture. In addition, different preprocessing and training methods are explored to get better performance. Experimental results on Uygur-Chinese and Turkish-English translation demonstrate the superiority of the proposed method over the baselines that use single methods.

Download Full-text

Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5341 ◽

2020 ◽

Vol 34 (01) ◽

pp. 115-122 ◽

Cited By ~ 3

Author(s):

Baijun Ji ◽

Zhirui Zhang ◽

Xiangyu Duan ◽

Min Zhang ◽

Boxing Chen ◽

...

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Large Scale ◽

Feature Space ◽

Target Language ◽

Smooth Transition ◽

Training Methods ◽

Neural Machine Translation ◽

Cross Lingual ◽

Effective Transfer

Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.

Download Full-text

Transfer Learning for Low-Resource Neural Machine Translation

10.18653/v1/d16-1163 ◽

2016 ◽

Cited By ~ 63

Author(s):

Barret Zoph ◽

Deniz Yuret ◽

Jonathan May ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation

IEEE Access ◽

10.1109/access.2019.2936002 ◽

2019 ◽

Vol 7 ◽

pp. 154157-154166 ◽

Cited By ~ 4

Author(s):

Gongxu Luo ◽

Yating Yang ◽

Yang Yuan ◽

Zhanheng Chen ◽

Aizimaiti Ainiwaer

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Trivial Transfer Learning for Low-Resource Neural Machine Translation

10.18653/v1/w18-6325 ◽

2018 ◽

Cited By ~ 3

Author(s):

Tom Kocmi ◽

Ondřej Bojar

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation

Tsinghua Science & Technology ◽

10.26599/tst.2020.9010029 ◽

2022 ◽

Vol 27 (1) ◽

pp. 150-163 ◽

Cited By ~ 1

Author(s):

Mieradilijiang Maimaiti ◽

Yang Liu ◽

Huanbo Luan ◽

Maosong Sun

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Future Internet ◽

10.3390/fi12120215 ◽

2020 ◽

Vol 12 (12) ◽

pp. 215

Author(s):

Wenbo Zhang ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong ◽

Gongxu Luo

Keyword(s):

Machine Translation ◽

Language Model ◽

Neural Machine Translation ◽

Translation Model ◽

Parallel Corpus ◽

Model Experiments ◽

Low Resource ◽

Translation Quality ◽

Number Of Layers ◽

Cross Lingual

Recently, the pretraining of models has been successfully applied to unsupervised and semi-supervised neural machine translation. A cross-lingual language model uses a pretrained masked language model to initialize the encoder and decoder of the translation model, which greatly improves the translation quality. However, because of a mismatch in the number of layers, the pretrained model can only initialize part of the decoder’s parameters. In this paper, we use a layer-wise coordination transformer and a consistent pretraining translation transformer instead of a vanilla transformer as the translation model. The former has only an encoder, and the latter has an encoder and a decoder, but the encoder and decoder have exactly the same parameters. Both models can guarantee that all parameters in the translation model can be initialized by the pretrained model. Experiments on the Chinese–English and English–German datasets show that compared with the vanilla transformer baseline, our models achieve better performance with fewer parameters when the parallel corpus is small.

Download Full-text