SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

Data augmentation has recently become an important method for improving performance in deep learning. It is also a significant issue in machine translation, and various innovations such as back-translation and noising have been made. In particular, current state-of-the-art model architectures such as BERT-fused or efficient data generation using the GPT model provide good inspiration to improve the translation performance. In this study, we propose the generation of additional data for neural machine translation (NMT) using a sentence generator by GPT-2 that produces similar characteristics to the original. BERT-fused architecture and back-translation are employed for the translation architecture. In our experiments, the model produced BLEU scores of 27.50 for tatoebaEn-Ja, 30.14 for WMT14En-De, and 24.12 for WMT18En-Ch.

Download Full-text

Improved neural machine translation for low-resource English–Assamese pair

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219260 ◽

2021 ◽

pp. 1-12

Author(s):

Sahinur Rahman Laskar ◽

Abdullah Faiz Ur Rahman Khilji ◽

Partha Pakray ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Language Translation ◽

Linguistically Diverse ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

The World ◽

Translation Accuracy ◽

Vocabulary Problems

Language translation is essential to bring the world closer and plays a significant part in building a community among people of different linguistic backgrounds. Machine translation dramatically helps in removing the language barrier and allows easier communication among linguistically diverse communities. Due to the unavailability of resources, major languages of the world are accounted as low-resource languages. This leads to a challenging task of automating translation among various such languages to benefit indigenous speakers. This article investigates neural machine translation for the English–Assamese resource-poor language pair by tackling insufficient data and out-of-vocabulary problems. We have also proposed an approach of data augmentation-based NMT, which exploits synthetic parallel data and shows significantly improved translation accuracy for English-to-Assamese and Assamese-to-English translation and obtained state-of-the-art results.

Download Full-text

A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation

Mathematical Problems in Engineering ◽

10.1155/2020/6140153 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Gong-Xu Luo ◽

Ya-Ting Yang ◽

Rui Dong ◽

Yan-Hong Chen ◽

Wen-Bo Zhang

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Large Scale ◽

Data Augmentation ◽

Training Methods ◽

Learning Method ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

Back Translation

Neural machine translation (NMT) for low-resource languages has drawn great attention in recent years. In this paper, we propose a joint back-translation and transfer learning method for low-resource languages. It is widely recognized that data augmentation methods and transfer learning methods are both straight forward and effective ways for low-resource problems. However, existing methods, which utilize one of these methods alone, limit the capacity of NMT models for low-resource problems. In order to make full use of the advantages of existing methods and further improve the translation performance of low-resource languages, we propose a new method to perfectly integrate the back-translation method with mainstream transfer learning architectures, which can not only initialize the NMT model by transferring parameters of the pretrained models, but also generate synthetic parallel data by translating large-scale monolingual data of the target side to boost the fluency of translations. We conduct experiments to explore the effectiveness of the joint method by incorporating back-translation into the parent-child and the hierarchical transfer learning architecture. In addition, different preprocessing and training methods are explored to get better performance. Experimental results on Uygur-Chinese and Turkish-English translation demonstrate the superiority of the proposed method over the baselines that use single methods.

Download Full-text

Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

10.18653/v1/2021.naacl-srw.18 ◽

2021 ◽

Author(s):

Seiichiro Kondo ◽

Kengo Hotate ◽

Tosho Hirasawa ◽

Masahiro Kaneko ◽

Mamoru Komachi

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Neural Machine Translation

Download Full-text

AUG-BERT: An Efficient Data Augmentation Algorithm for Text Classification

Lecture Notes in Electrical Engineering - Communications, Signal Processing, and Systems ◽

10.1007/978-981-13-9409-6_266 ◽

2020 ◽

pp. 2191-2198

Author(s):

Linqing Shi ◽

Danyang Liu ◽

Gongshen Liu ◽

Kui Meng

Keyword(s):

Text Classification ◽

Data Augmentation ◽

Augmentation Algorithm ◽

Efficient Data

Download Full-text

Supervised neural machine translation based on data augmentation and improved training \& inference process

10.18653/v1/d19-5218 ◽

2019 ◽

Author(s):

Yixuan Tong ◽

Liang Liang ◽

Boyan Liu ◽

Shanshan Jiang ◽

Bin Dong

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Inference Process ◽

Neural Machine Translation

Download Full-text

Data augmentation using back-translation for context-aware neural machine translation

10.18653/v1/d19-6504 ◽

2019 ◽

Cited By ~ 2

Author(s):

Amane Sugiyama ◽

Naoki Yoshinaga

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Context Aware ◽

Neural Machine Translation ◽

Back Translation

Download Full-text

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Informatics ◽

10.3390/informatics8010007 ◽

2021 ◽

Vol 8 (1) ◽

pp. 7

Author(s):

Arda Tezcan ◽

Bram Bulté ◽

Bram Vanroy

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Sentence Length ◽

Added Value ◽

Neural Machine Translation ◽

Combination Technique ◽

Translation Quality ◽

Fuzzy Match ◽

The Impact ◽

Matching Techniques

We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.

Download Full-text

Tackling neural machine translation in low-resource settings: a Portuguese case study

10.5753/stil.2021.17807 ◽

2021 ◽

Author(s):

Arthur T. Estrella ◽

João B. O. Souza Filho

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Word Embeddings ◽

Effective Solution ◽

Computational Power ◽

Limited Data ◽

Neural Machine Translation ◽

Low Resource

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.

Download Full-text

Counterfactual Data Augmentation for Neural Machine Translation

10.18653/v1/2021.naacl-main.18 ◽

2021 ◽

Author(s):

Qi Liu ◽

Matt Kusner ◽

Phil Blunsom

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Neural Machine Translation

Download Full-text