Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Junliang Guo; Xu Tan; Linli Xu; Tao Qin; Enhong Chen; Tie-Yan Liu

doi:10.1609/aaai.v34i05.6289

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6289 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7839-7846

Author(s):

Junliang Guo ◽

Xu Tan ◽

Linli Xu ◽

Tao Qin ◽

Enhong Chen ◽

...

Keyword(s):

Machine Translation ◽

Fine Tuning ◽

Inference Process ◽

Neural Machine Translation ◽

Training Strategy ◽

Speed Up ◽

Good Improvement ◽

Tuning Process ◽

Translation Accuracy ◽

The Cost

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.

Download Full-text

Improved neural machine translation for low-resource English–Assamese pair

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219260 ◽

2021 ◽

pp. 1-12

Author(s):

Sahinur Rahman Laskar ◽

Abdullah Faiz Ur Rahman Khilji ◽

Partha Pakray ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Language Translation ◽

Linguistically Diverse ◽

Neural Machine Translation ◽

Low Resource ◽

Parallel Data ◽

The World ◽

Translation Accuracy ◽

Vocabulary Problems

Language translation is essential to bring the world closer and plays a significant part in building a community among people of different linguistic backgrounds. Machine translation dramatically helps in removing the language barrier and allows easier communication among linguistically diverse communities. Due to the unavailability of resources, major languages of the world are accounted as low-resource languages. This leads to a challenging task of automating translation among various such languages to benefit indigenous speakers. This article investigates neural machine translation for the English–Assamese resource-poor language pair by tackling insufficient data and out-of-vocabulary problems. We have also proposed an approach of data augmentation-based NMT, which exploits synthetic parallel data and shows significantly improved translation accuracy for English-to-Assamese and Assamese-to-English translation and obtained state-of-the-art results.

Download Full-text

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

10.18653/v1/d19-1146 ◽

2019 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita ◽

Chenhui Chu

Keyword(s):

Machine Translation ◽

Fine Tuning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

BERTTune: Fine-Tuning Neural Machine Translation with BERTScore

10.18653/v1/2021.acl-short.115 ◽

2021 ◽

Author(s):

Inigo Jauregi Unanue ◽

Jacob Parnell ◽

Massimo Piccardi

Keyword(s):

Machine Translation ◽

Fine Tuning ◽

Neural Machine Translation

Download Full-text

Speed Up the Training of Neural Machine Translation

Neural Processing Letters ◽

10.1007/s11063-019-10084-y ◽

2019 ◽

Vol 51 (1) ◽

pp. 231-249 ◽

Cited By ~ 2

Author(s):

Xinyue Liu ◽

Weixuan Wang ◽

Wenxin Liang ◽

Yuangang Li

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Speed Up

Download Full-text

Supervised neural machine translation based on data augmentation and improved training \& inference process

10.18653/v1/d19-5218 ◽

2019 ◽

Author(s):

Yixuan Tong ◽

Liang Liang ◽

Boyan Liu ◽

Shanshan Jiang ◽

Bin Dong

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Inference Process ◽

Neural Machine Translation

Download Full-text

Tag-less Back-Translation

10.21203/rs.3.rs-465941/v1 ◽

2021 ◽

Author(s):

Idris Abdulmumin ◽

Bashir Shehu Galadanci ◽

Aliyu Garba

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Fine Tuning ◽

Huge Amount ◽

Neural Machine Translation ◽

Translation Model ◽

Parallel Data ◽

Back Translation ◽

Authentic Data ◽

Target Side

Abstract An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize the available huge amount of existing monolingual data because of the inability of translation models to differentiate between the authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In the approach - tag-less back-translation - the synthetic and authentic parallel data are treated as out-of-domain and in-domain data respectively and, through pre-training and fine-tuning, the translation model is shown to be able to learn more efficiently from them during training. Experimental results have shown that the approach outperforms the standard and tagged back-translation approaches on low resource English-Vietnamese and English-German neural machine translation.

Download Full-text

Sequence-Level Training for Non-Autoregressive Neural Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00421 ◽

2021 ◽

pp. 1-36

Author(s):

Chenze Shao ◽

Yang Feng ◽

Jinchao Zhang ◽

Fandong Meng ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

Cross Entropy ◽

Gradient Estimation ◽

Neural Machine Translation ◽

Training Strategy ◽

Word Generation ◽

Translation Quality ◽

Word Level ◽

Training Objective ◽

Level Training

Abstract In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup through generating target words independently and simultaneously. Nevertheless, NAT still takes the word-level cross-entropy loss as the training objective, which is not optimal because the output of NAT cannot be properly evaluated due to the multimodality problem. In this article, we propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality. Firstly, we propose training NAT models to optimize sequence-level evaluation metrics (e.g., BLEU) based on several novel reinforcement algorithms customized for NAT, which outperforms the conventional method by reducing the variance of gradient estimation. Secondly, we introduce a novel training objective for NAT models, which aims to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The BoN training objective is differentiable and can be calculated efficiently without doing any approximations. Finally, we apply a three-stage training strategy to combine these two methods to train the NAT model.We validate our approach on four translation tasks (WMT14 En↔De, WMT16 En↔Ro), which shows that our approach largely outperforms NAT baselines and achieves remarkable performance on all translation tasks. The source code is available at https://github.com/ictnlp/Seq-NAT.

Download Full-text

Non-autoregressive neural machine translation with auxiliary representation fusion

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211105 ◽

2021 ◽

pp. 1-11

Author(s):

Quan Du ◽

Kai Feng ◽

Chen Xu ◽

Tong Xiao ◽

Jingbo Zhu

Keyword(s):

Machine Translation ◽

Experimental Results ◽

The Other ◽

Generation Process ◽

Neural Machine Translation ◽

Trade Off ◽

Translation Quality ◽

Other Hand ◽

Entire Sequence ◽

Translation Accuracy

Recently, many efforts have been devoted to speeding up neural machine translation models. Among them, the non-autoregressive translation (NAT) model is promising because it removes the sequential dependence on the previously generated tokens and parallelizes the generation process of the entire sequence. On the other hand, the autoregressive translation (AT) model in general achieves a higher translation accuracy than the NAT counterpart. Therefore, a natural idea is to fuse the AT and NAT models to seek a trade-off between inference speed and translation quality. This paper proposes an ARF-NAT model (NAT with auxiliary representation fusion) to introduce the merit of a shallow AT model to an NAT model. Three functions are designed to fuse the auxiliary representation into the decoder of the NAT model. Experimental results show that ARF-NAT outperforms the NAT baseline by 5.26 BLEU scores on the WMT’14 German-English task with a significant speedup (7.58 times) over several strong AT baselines.

Download Full-text

Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6465 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9266-9273

Author(s):

Rongxiang Weng ◽

Heng Yu ◽

Shujian Huang ◽

Shanbo Cheng ◽

Weihua Luo

Keyword(s):

Machine Translation ◽

Large Scale ◽

Fine Tuning ◽

Great Success ◽

Training Process ◽

Neural Machine Translation ◽

Language Knowledge ◽

Knowledge Distillation ◽

Training Objective ◽

Natural Language Process

Pre-training and fine-tuning have achieved great success in natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an Apt framework for acquiring knowledge from pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts.

Download Full-text

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0031 ◽

2017 ◽

Vol 108 (1) ◽

pp. 331-342 ◽

Cited By ~ 8

Author(s):

Duygu Ataman ◽

Matteo Negri ◽

Marco Turchi ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

Model Complexity ◽

Morphological Properties ◽

Input Language ◽

Neural Machine Translation ◽

Word Structure ◽

Morphologically Rich Languages ◽

Morphology Learning ◽

Translation Accuracy ◽

Syntactic Properties

AbstractThe necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the linguistic properties of words, which leads to interruptions in the word structure and causes semantic and syntactic losses. In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language. Our method is based on unsupervised morphology learning and can be, in principle, used for pre-processing any language pair. We also present an alternative word segmentation method based on supervised morphological analysis, which aids us in measuring the accuracy of our model. We evaluate our method in Turkish-to-English NMT task where the input language is morphologically rich and agglutinative. We analyze different representation methods in terms of translation accuracy as well as the semantic and syntactic properties of the generated output. Our method obtains a significant improvement of 2.3 BLEU points over the conventional vocabulary reduction technique, showing that it can provide better accuracy in open vocabulary translation of morphologically rich languages.

Download Full-text