The Impact of Machine Translation Quality on Human Post-Editing

Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models

10.3115/v1/d14-1172 ◽

2014 ◽

Cited By ~ 6

Author(s):

Marcello Federico ◽

Matteo Negri ◽

Luisa Bentivogli ◽

Marco Turchi

Keyword(s):

Machine Translation ◽

Mixed Effects ◽

Mixed Effects Models ◽

Translation Quality ◽

Translation Errors ◽

The Impact

Download Full-text

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Informatics ◽

10.3390/informatics8010007 ◽

2021 ◽

Vol 8 (1) ◽

pp. 7

Author(s):

Arda Tezcan ◽

Bram Bulté ◽

Bram Vanroy

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Sentence Length ◽

Added Value ◽

Neural Machine Translation ◽

Combination Technique ◽

Translation Quality ◽

Fuzzy Match ◽

The Impact ◽

Matching Techniques

We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.

Download Full-text

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Neural Machine Translation ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Download Full-text

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00377 ◽

2020 ◽

Vol 46 (2) ◽

pp. 387-424 ◽

Cited By ~ 1

Author(s):

Raúl Vázquez ◽

Alessandro Raganato ◽

Mathias Creutz ◽

Jörg Tiedemann

Keyword(s):

Machine Translation ◽

Improve Performance ◽

Neural Machine Translation ◽

Translation Quality ◽

Intermediate Layers ◽

Depth Analysis ◽

Classification Tasks ◽

The Impact ◽

Meaning Representation

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

Download Full-text

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Information ◽

10.3390/info13010019 ◽

2022 ◽

Vol 13 (1) ◽

pp. 19

Author(s):

Arda Tezcan ◽

Bram Bulté

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Data Augmentation ◽

Training Data ◽

Quality Improvements ◽

Translation Quality ◽

Automated Evaluation ◽

Translation Errors ◽

Different Characteristics ◽

The Impact

Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation.

Download Full-text

Context-Aware Neural Machine Translation for Korean Honorific Expressions

Electronics ◽

10.3390/electronics10131589 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1589

Author(s):

Yongkeun Hwang ◽

Yanghoon Kim ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Contextual Information ◽

Context Aware ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Proposed Model ◽

The Given ◽

The Relationship

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Download Full-text

A Survey on Document-level Neural Machine Translation

ACM Computing Surveys ◽

10.1145/3441691 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-36

Author(s):

Sameen Maruf ◽

Fahimeh Saleh ◽

Gholamreza Haffari

Keyword(s):

Machine Translation ◽

Language Processing ◽

Research Field ◽

Translation Process ◽

Future Directions ◽

Translation Quality ◽

Current State ◽

Evaluation Strategies ◽

Almost All ◽

Document Level

Machine translation (MT) is an important task in natural language processing (NLP), as it automates the translation process and reduces the reliance on human translators. With the resurgence of neural networks, the translation quality surpasses that of the translations obtained using statistical techniques for most language-pairs. Up until a few years ago, almost all of the neural translation models translated sentences independently , without incorporating the wider document-context and inter-dependencies among the sentences. The aim of this survey article is to highlight the major works that have been undertaken in the space of document-level machine translation after the neural revolution, so researchers can recognize the current state and future directions of this field. We provide an organization of the literature based on novelties in modelling and architectures as well as training and decoding strategies. In addition, we cover evaluation strategies that have been introduced to account for the improvements in document MT, including automatic metrics and discourse-targeted test sets. We conclude by presenting possible avenues for future exploration in this research field.

Download Full-text

Dimensionality reduction methods for machine translation quality estimation

Machine Translation ◽

10.1007/s10590-013-9139-3 ◽

2013 ◽

Vol 27 (3-4) ◽

pp. 281-301 ◽

Cited By ~ 5

Author(s):

Jesús González-Rubio ◽

J. Ramón Navarro-Cerdán ◽

Francisco Casacuberta

Keyword(s):

Dimensionality Reduction ◽

Machine Translation ◽

Quality Estimation ◽

Translation Quality ◽

Reduction Methods

Download Full-text

Improving thai-lao neural machine translation with similarity lexicon

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212236 ◽

2021 ◽

pp. 1-10

Author(s):

Zhiqiang Yu ◽

Yuxin Huang ◽

Junjun Guo

Keyword(s):

Machine Translation ◽

Semantic Information ◽

Neural Machine Translation ◽

Low Resource ◽

Translation Quality ◽

Decoder Architecture ◽

Baseline System ◽

Input Sentence ◽

Resource Conditions ◽

Language Pair

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions. Thai-Lao is a typical low-resource language pair of tiny parallel corpus, leading to suboptimal NMT performance on it. However, Thai and Lao have considerable similarities in linguistic morphology and have bilingual lexicon which is relatively easy to obtain. To use this feature, we first build a bilingual similarity lexicon composed of pairs of similar words. Then we propose a novel NMT architecture to leverage the similarity between Thai and Lao. Specifically, besides the prevailing sentence encoder, we introduce an extra similarity lexicon encoder into the conventional encoder-decoder architecture, by which the semantic information carried by the similarity lexicon can be represented. We further provide a simple mechanism in the decoder to balance the information representations delivered from the input sentence and the similarity lexicon. Our approach can fully exploit linguistic similarity carried by the similarity lexicon to improve translation quality. Experimental results demonstrate that our approach achieves significant improvements over the state-of-the-art Transformer baseline system and previous similar works.

Download Full-text

Metric for Evaluation of Machine Translation Quality on the bases of Edit Distances and Reverse Translation

10.1109/aict52784.2021.9620304 ◽

2021 ◽

Author(s):

V.S. Kornilov ◽

V.M. Glushan ◽

Lozovoy A. Yu

Keyword(s):

Machine Translation ◽

Translation Quality ◽

Reverse Translation

Download Full-text