A Grammatical Analysis on Machine Translation Errors

Author(s):  
Shili Ge ◽  
Susu Wu ◽  
Xiaoxiao Chen ◽  
Rou Song
2020 ◽  
Vol 34 (2-3) ◽  
pp. 149-195
Author(s):  
Rejwanul Haque ◽  
Mohammed Hasanuzzaman ◽  
Andy Way

Babel ◽  
2020 ◽  
Vol 66 (4-5) ◽  
pp. 867-881
Author(s):  
Yanlin Guo

Abstract Since entering the new era, the translation model has gradually changed with the widespread application of machine translation technology and the rapid development of a translation industry. The mismatch between the demand of employers and the talents trained by universities has become a major problem facing the translation major nowadays. To this end, we should attach more importance to the readjustment of the existent curriculum; students’ practical ability in translation; grasp of the skill of detecting and correcting machine translation errors; combination of translation and relevant professional knowledge.


Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1213
Author(s):  
Guanghao Xu ◽  
Youngjoong Ko ◽  
Jungyun Seo

Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively.


2021 ◽  
Vol 11 (2) ◽  
pp. 489-501
Author(s):  
Trond Trosterud ◽  
Lene Antonsen

The article presents a rule-based machine translation system from Northern Sami to Norwegian. The grammatical analysis is done with Giellatekno and Divvun's North Sami program for analysis and translation. We have written the transfer component (transfer lexicon and grammatical rules) within the framework of the open machine translation system Apertium. The article contains an evaluation of translated text for two different domains. The translated texts score better on the presentation of the content than on fluent language. By classifying the errors into lexical, grammatical and pragmatic errors, we show that lexical errors are the most harmful for text comprehension. The other two types of errors give a poor language quality, but they have little effect on comprehension. The type of error that is the easiest to correct is the lexical, which is a promising conclusion for the development of a machine translation system for text comprehension.


Author(s):  
Nora Aranberri

Machine translation post-editing is becoming commonplace and professional translators are often faced with this unknown task with little training and support. Given the different translation processes involved during post-editing, research suggests that untrained translators do not necessarily make good post-editors. Besides, the post-editing activity will be largely influenced by numerous aspects related to the technology and texts used. Training material, therefore, will need to be tailored to the particular conditions under which post-editing is bound to happen. In this work, we provide a first attempt to uncover what activity professional translators carry out when working from Spanish into Basque. Our initial analysis reveals that when working with moderate machine translation output post-editing shifts from the task of identifying and fixing errors, to that of “patchwork” where post-editors identify the machine translated elements to reuse and connect them using their own contributions. Data also reveal that they primarily focus on correcting machine translation errors but often fail to restrain themselves from editing correct structures. Both findings have clear implications for training and are a step forward in tailoring sessions specifically for language combinations of moderate quality.


Author(s):  
Rongxiang Weng ◽  
Hao Zhou ◽  
Shujian Huang ◽  
Lei Li ◽  
Yifan Xia ◽  
...  

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods. 


2020 ◽  
Vol 34 (01) ◽  
pp. 198-205
Author(s):  
Chenze Shao ◽  
Jinchao Zhang ◽  
Yang Feng ◽  
Fandong Meng ◽  
Jie Zhou

Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En↔De and about 2.5 BLEU scores on WMT16 En↔Ro.


Sign in / Sign up

Export Citation Format

Share Document