A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation

Neural machine translation (NMT) for low-resource languages has drawn great attention in recent years. In this paper, we propose a joint back-translation and transfer learning method for low-resource languages. It is widely recognized that data augmentation methods and transfer learning methods are both straight forward and effective ways for low-resource problems. However, existing methods, which utilize one of these methods alone, limit the capacity of NMT models for low-resource problems. In order to make full use of the advantages of existing methods and further improve the translation performance of low-resource languages, we propose a new method to perfectly integrate the back-translation method with mainstream transfer learning architectures, which can not only initialize the NMT model by transferring parameters of the pretrained models, but also generate synthetic parallel data by translating large-scale monolingual data of the target side to boost the fluency of translations. We conduct experiments to explore the effectiveness of the joint method by incorporating back-translation into the parent-child and the hierarchical transfer learning architecture. In addition, different preprocessing and training methods are explored to get better performance. Experimental results on Uygur-Chinese and Turkish-English translation demonstrate the superiority of the proposed method over the baselines that use single methods.

Download Full-text

Improved neural machine translation for low-resource English–Assamese pair

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219260 ◽

2021 ◽

pp. 1-12

Author(s):

Sahinur Rahman Laskar ◽

Abdullah Faiz Ur Rahman Khilji ◽

Partha Pakray ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Language Translation ◽

Linguistically Diverse ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

The World ◽

Translation Accuracy ◽

Vocabulary Problems

Language translation is essential to bring the world closer and plays a significant part in building a community among people of different linguistic backgrounds. Machine translation dramatically helps in removing the language barrier and allows easier communication among linguistically diverse communities. Due to the unavailability of resources, major languages of the world are accounted as low-resource languages. This leads to a challenging task of automating translation among various such languages to benefit indigenous speakers. This article investigates neural machine translation for the English–Assamese resource-poor language pair by tackling insufficient data and out-of-vocabulary problems. We have also proposed an approach of data augmentation-based NMT, which exploits synthetic parallel data and shows significantly improved translation accuracy for English-to-Assamese and Assamese-to-English translation and obtained state-of-the-art results.

Download Full-text

Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5341 ◽

2020 ◽

Vol 34 (01) ◽

pp. 115-122 ◽

Cited By ~ 3

Author(s):

Baijun Ji ◽

Zhirui Zhang ◽

Xiangyu Duan ◽

Min Zhang ◽

Boxing Chen ◽

...

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Large Scale ◽

Feature Space ◽

Target Language ◽

Smooth Transition ◽

Training Methods ◽

Neural Machine Translation ◽

Cross Lingual ◽

Effective Transfer

Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.

Download Full-text

Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/739 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chang Xu ◽

Tao Qin ◽

Gang Wang ◽

Tie-Yan Liu

Keyword(s):

Machine Translation ◽

Loss Function ◽

General Framework ◽

Large Scale ◽

Great Success ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

Benchmark Datasets ◽

First Time

Neural machine translation (NMT) has achieved great success. However, collecting large-scale parallel data for training is costly and laborious. Recently, unsupervised neural machine translation has attracted more and more attention, due to its demand for monolingual corpus only, which is common and easy to obtain, and its great potentials for the low-resource or even zero-resource machine translation. In this work, we propose a general framework called Polygon-Net, which leverages multi auxiliary languages for jointly boosting unsupervised neural machine translation models. Specifically, we design a novel loss function for multi-language unsupervised neural machine translation. In addition, different from the literature that just updating one or two models individually, Polygon-Net enables multiple unsupervised models in the framework to update in turn and enhance each other for the first time. In this way, multiple unsupervised translation models are associated with each other for training to achieve better performance. Experiments on the benchmark datasets including UN Corpus and WMT show that our approach significantly improves over the two-language based methods, and achieves better performance with more languages introduced to the framework.

Download Full-text

A Survey on Low-Resource Neural Machine Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/629 ◽

2021 ◽

Author(s):

Rui Wang ◽

Xu Tan ◽

Renqian Luo ◽

Tao Qin ◽

Tie-Yan Liu

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Neural Machine Translation ◽

Modal Data ◽

Low Resource ◽

Resource Setting ◽

Low Resource Setting ◽

Parallel Data ◽

Target Languages

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.

Download Full-text

A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation

Information ◽

10.3390/info11050255 ◽

2020 ◽

Vol 11 (5) ◽

pp. 255

Author(s):

Yu Li ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong

Keyword(s):

Machine Translation ◽

English Translation ◽

Data Augmentation ◽

Sampling Strategy ◽

Training Data ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

Augmentation Strategy ◽

Diverse Data

One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic parallel corpus to train the final model. In the experiment, the proposed approach achieved 1.96 BLEU points in the IWSLT2014 German–English translation tasks, which was used to simulate a low-resource language. Our approach also consistently and substantially obtained 1.0 to 2.0 BLEU improvement in three other low-resource translation tasks, including English–Turkish, Nepali–English, and Sinhala–English translation tasks.

Download Full-text

Low-Resource Neural Machine Translation Using Fast Meta-learning Method

10.1007/978-3-030-92273-3_16 ◽

2021 ◽

pp. 188-199

Author(s):

Nier Wu ◽

Hongxu Hou ◽

Wei Zheng ◽

Shuo Sun

Keyword(s):

Machine Translation ◽

Learning Method ◽

Neural Machine Translation ◽

Low Resource ◽

Meta Learning

Download Full-text

Improving Neural Machine Translation by Filtering Synthetic Parallel Data

Entropy ◽

10.3390/e21121213 ◽

2019 ◽

Vol 21 (12) ◽

pp. 1213

Author(s):

Guanghao Xu ◽

Youngjoong Ko ◽

Jungyun Seo

Keyword(s):

Machine Translation ◽

Synthetic Data ◽

Similarity Score ◽

Target Language ◽

Neural Machine Translation ◽

Novel Approach ◽

Parallel Data ◽

Back Translation ◽

Translation Errors ◽

Training State

Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively.

Download Full-text

Enhanced Back-Translation for Low Resource Neural Machine Translation Using Self-training

Communications in Computer and Information Science - Information and Communication Technology and Applications ◽

10.1007/978-3-030-69143-1_28 ◽

2021 ◽

pp. 355-371

Author(s):

Idris Abdulmumin ◽

Bashir Shehu Galadanci ◽

Abubakar Isa

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Low Resource ◽

Back Translation

Download Full-text

Neural machine translation of low-resource languages using SMT phrase pair injection

Natural Language Engineering ◽

10.1017/s1351324920000303 ◽

2020 ◽

pp. 1-22

Author(s):

Sukanta Sen ◽

Mohammed Hasanuzzaman ◽

Asif Ekbal ◽

Pushpak Bhattacharyya ◽

Andy Way

Keyword(s):

Machine Translation ◽

Large Scale ◽

Production Systems ◽

Statistical Machine Translation ◽

Training Data ◽

Original Training ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Better Than

Abstract Neural machine translation (NMT) has recently shown promising results on publicly available benchmark datasets and is being rapidly adopted in various production systems. However, it requires high-quality large-scale parallel corpus, and it is not always possible to have sufficiently large corpus as it requires time, money, and professionals. Hence, many existing large-scale parallel corpus are limited to the specific languages and domains. In this paper, we propose an effective approach to improve an NMT system in low-resource scenario without using any additional data. Our approach aims at augmenting the original training data by means of parallel phrases extracted from the original training data itself using a statistical machine translation (SMT) system. Our proposed approach is based on the gated recurrent unit (GRU) and transformer networks. We choose the Hindi–English, Hindi–Bengali datasets for Health, Tourism, and Judicial (only for Hindi–English) domains. We train our NMT models for 10 translation directions, each using only 5–23k parallel sentences. Experiments show the improvements in the range of 1.38–15.36 BiLingual Evaluation Understudy points over the baseline systems. Experiments show that transformer models perform better than GRU models in low-resource scenarios. In addition to that, we also find that our proposed method outperforms SMT—which is known to work better than the neural models in low-resource scenarios—for some translation directions. In order to further show the effectiveness of our proposed model, we also employ our approach to another interesting NMT task, for example, old-to-modern English translation, using a tiny parallel corpus of only 2.7K sentences. For this task, we use publicly available old-modern English text which is approximately 1000 years old. Evaluation for this task shows significant improvement over the baseline NMT.

Download Full-text

Tag-less Back-Translation

10.21203/rs.3.rs-465941/v1 ◽

2021 ◽

Author(s):

Idris Abdulmumin ◽

Bashir Shehu Galadanci ◽

Aliyu Garba

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Fine Tuning ◽

Huge Amount ◽

Neural Machine Translation ◽

Translation Model ◽

Parallel Data ◽

Back Translation ◽

Authentic Data ◽

Target Side

Abstract An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize the available huge amount of existing monolingual data because of the inability of translation models to differentiate between the authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In the approach - tag-less back-translation - the synthetic and authentic parallel data are treated as out-of-domain and in-domain data respectively and, through pre-training and fine-tuning, the translation model is shown to be able to learn more efficiently from them during training. Experimental results have shown that the approach outperforms the standard and tagged back-translation approaches on low resource English-Vietnamese and English-German neural machine translation.

Download Full-text