Analysing terminology translation errors in statistical and neural machine translation

Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively.

Download Full-text

Correct-and-Memorize: Learning to Translate from Interactive Revisions

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/730 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rongxiang Weng ◽

Hao Zhou ◽

Shujian Huang ◽

Lei Li ◽

Yifan Xia ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Translation Process ◽

Neural Machine Translation ◽

Human Interactions ◽

Interactive Environment ◽

Novel Method ◽

Critical Revision ◽

Target Languages ◽

Translation Errors

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

Download Full-text

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5351 ◽

2020 ◽

Vol 34 (01) ◽

pp. 198-205

Author(s):

Chenze Shao ◽

Jinchao Zhang ◽

Yang Feng ◽

Fandong Meng ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

Cross Entropy ◽

Sequential Dependency ◽

Weak Correlation ◽

Neural Machine Translation ◽

Translation Quality ◽

Word Level ◽

Translation Errors ◽

Training Objective ◽

Target Side

Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En↔De and about 2.5 BLEU scores on WMT16 En↔Ro.

Download Full-text