COSTA MT Evaluation Tool: An Open Toolkit for Human Machine Translation Evaluation

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.

Download Full-text

How to evaluate machine translation: A review of automated and human metrics

Natural Language Engineering ◽

10.1017/s1351324919000469 ◽

2019 ◽

Vol 26 (2) ◽

pp. 137-161

Author(s):

Eirini Chatzikoumi

Keyword(s):

Machine Translation ◽

Subjective Evaluation ◽

Evaluation Methods ◽

Quality Estimation ◽

Neural Machine Translation ◽

Mt Evaluation ◽

Error Classification ◽

Better Than ◽

Detailed Presentation

AbstractThis article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Download Full-text

Multimodality and evaluation of machine translation: a proposal for investigating intersemiotic mismatches generated by the use of machine translation in multimodal documents / Multimodalidade e avaliação de tradução automática: uma proposta para a investigação de incompatibilidades intersemióticas geradas pelo uso do tradutor automático em documentos multimodais

Texto Livre Linguagem e Tecnologia ◽

10.17851/1983-3652.11.1.82-102 ◽

2018 ◽

Vol 11 (1) ◽

pp. 82-102

Author(s):

Thiago Blanch Pires

Keyword(s):

Machine Translation ◽

Interdisciplinary Approach ◽

Research Problem ◽

Semantic Relations ◽

Machine Translation Evaluation ◽

Translation Error ◽

Error Classification ◽

Liu E

ABSTRACT: This article aims at proposing an interdisciplinary approach involving the areas of Multimodality and Evaluation of Machine Translation to explore new configurations of text-image semantic relations generated by machine translation results. The methodology consists of a brief contextualization of the research problem, followed by the presentation and study of concepts and possibilities of Multimodality and Evaluation of Machine Translation, with an emphasis on the notion of intersemiotic texture, proposed by Liu and O'Halloran (2009), and a study of machine translation error classification, proposed by Vilar et. al. (2006). Finally, the article suggests some potentialities and limitations when combining the application of both areas of investigation.KEYWORDS: multimodality; machine translation; evaluation of machine translation; intersemiotic mismatches.RESUMO: Este artigo tem como objetivo propor uma abordagem interdisciplinar envolvendo as áreas da multimodalidade e da avaliação de tradução automática para explorar novas configurações de relações semânticas entre texto e imagem geradas por resultados de traduções automáticas. A metodologia é composta de uma breve contextualização sobre o problema de investigação, seguida da apresentação e do estudo de conceitos e possibilidades da multimodalidade e da avaliação de tradução automática, com destaque para os trabalhos respectivamente sobre textura intersemiótica proposta por Liu e O’Halloran (2009) e classificação de erros de máquinas de tradução proposta por Vilar et. al. (2006). Ao final, o estudo sugere algumas potencialidades e limitações no uso conjugado de ambas as áreas.PALAVRAS-CHAVE: multimodalidade; tradução automática; avaliação de tradução automática; incompatibilidades intersemióticas.

Download Full-text

EXPERT EVALUATION OF MACHINE TRANSLATION: ERROR CLASSIFICATION

Systems and Means of Informatics ◽

10.14357/08696527210313 ◽

2021 ◽

Keyword(s):

Machine Translation ◽

Expert Evaluation ◽

Translation Error ◽

Error Classification

Download Full-text

Estimating Machine Translation Quality of Any Input Sentence

International Journal of Asian Language Processing ◽

10.1142/s2717554520500022 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050002

Author(s):

Taichi Aida ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Evaluation Model ◽

Joint Probability ◽

Translation System ◽

Neural Machine Translation ◽

Translation Quality ◽

Machine Translation System ◽

Input Sentence ◽

The One

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.

Download Full-text

Assessment of Multi-Engine Machine Translation for English to Hindi Language (MEMTEHiL)

International Journal of Artificial Life Research ◽

10.4018/ijalr.2016010102 ◽

2016 ◽

Vol 6 (1) ◽

pp. 30-45

Author(s):

Pankaj K. Goswami ◽

Sanjay K. Dwivedi ◽

C. K. Jha

Keyword(s):

Computer Science ◽

Machine Translation ◽

Translation Quality ◽

Science Domain ◽

Hindi Language ◽

Interactive Version ◽

Language Pair

English to Hindi translation of the computer-science related e-content, generated through an online freely available machine translation engine may not be technically correct. The expected target translation should be as fluent as intended for the native learners and the meaning of a source e-content should be conveyed properly. A Multi-Engine Machine Translation for English to Hindi Language (MEMTEHiL) framework has been designed and integrated by the authors as a translation solution for the computer science domain e-content. It was possible by enabling the use of well-tested approaches of machine translation. The humanly evaluated and acceptable metrics like fluency and adequacy (F&A) were used to assess the best translation quality for English to Hindi language pair. Besides humanly-judged metrics, another well-tested and existing interactive version of Bi-Lingual Evaluation Understudy (iBLEU) was used for evaluation. Authors have incorporated both parameters (F&A and iBLEU) for assessing the quality of translation as regenerated by the designed MEMTEHiL.

Download Full-text

An Improved English-to-Mizo Neural Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3445974 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-21

Author(s):

Candy Lalrempuii ◽

Badal Soni ◽

Partha Pakray

Keyword(s):

Machine Translation ◽

Model Performance ◽

Sentence Length ◽

Prediction Errors ◽

Indian Languages ◽

Neural Machine Translation ◽

Automatic Translation ◽

Translation Techniques ◽

Language Pair

Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.

Download Full-text

Quality of neural machine translation for the Korean-Japanese language pair - the development of editing codes for machine translation -

Interpretation and Translation ◽

10.20305/it201801043071 ◽

2018 ◽

Vol 20 (1) ◽

pp. 43-71

Author(s):

JuRiAe Lee ◽

Keyword(s):

Machine Translation ◽

Japanese Language ◽

Neural Machine Translation ◽

Language Pair

Download Full-text

Evaluation of English–Slovak Neural and Statistical Machine Translation

Applied Sciences ◽

10.3390/app11072948 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2948

Author(s):

Lucia Benkova ◽

Dasa Munkova ◽

Ľubomír Benko ◽

Michal Munk

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Specific Domain ◽

Neural Network Approach ◽

Neural Machine Translation ◽

Translation Quality ◽

The Neural Network ◽

Language Pair

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Download Full-text

Performance Enhancement of Machine Translation Evaluation Systems for English – Hindi Language Pair

International Journal of Modern Education and Computer Science ◽

10.5815/ijmecs.2019.02.06 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42-49

Author(s):

Pooja Malik ◽

◽

Anurag Singh Baghel

Keyword(s):

Machine Translation ◽

Performance Enhancement ◽

Machine Translation Evaluation ◽

Evaluation Systems ◽

Hindi Language ◽

Language Pair

Download Full-text

Kashmiri to English Machine Translation: A Study in Translation Divergence Issues of Personal and Possessive Pronouns

Indian Journal of Multilingual Research and Development ◽

10.34256/ijmrd2111 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-9

Author(s):

Sajad Hussain Wani

Keyword(s):

Machine Translation ◽

Computational Linguistics ◽

English Language ◽

Research Field ◽

Target Language ◽

And Gender ◽

Near Future ◽

Case Number ◽

Language Pair

Machine translation (MT) as a sub-field of computational linguistics represents one of the most advanced and applied translation dimensions as a research field. Translation divergence occurs when structurally similar sentences of the source language do not translate into sentences that are similar in structure in the target language" (Dorr, 1993). The sophistication in the domain of MT depends mainly on the identification of divergence patterns in a language pair. Many researchers in MT field including Dorr (1990, 1994) have emphasized that the best quality in MT can be achieved when an individual language pair in a particular context is described in detail. This paper attempts to explore the divergence patterns that characterize the translation of Kashmiri pronouns into English. The analysis in this paper has been restricted to the class of personal and possessive pronouns. Kashmiri has rich inflections and pronouns are marked for case, number, tense and gender and show complex agreement patterns. The paper identifies and outlines a wide variety of divergence patterns that characterize the Kashmiri English language pair. These divergence patterns are identified and summarized in order to improve the quality of the MT system that may be developed for Kashmiri English language pair in the near future and can also be utilized for other language pairs that are similar in terms of their structure and typological features.

Download Full-text