EXPERT EVALUATION OF MACHINE TRANSLATION: ERROR CLASSIFICATION

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.

Download Full-text

Multimodality and evaluation of machine translation: a proposal for investigating intersemiotic mismatches generated by the use of machine translation in multimodal documents / Multimodalidade e avaliação de tradução automática: uma proposta para a investigação de incompatibilidades intersemióticas geradas pelo uso do tradutor automático em documentos multimodais

Texto Livre Linguagem e Tecnologia ◽

10.17851/1983-3652.11.1.82-102 ◽

2018 ◽

Vol 11 (1) ◽

pp. 82-102

Author(s):

Thiago Blanch Pires

Keyword(s):

Machine Translation ◽

Interdisciplinary Approach ◽

Research Problem ◽

Semantic Relations ◽

Machine Translation Evaluation ◽

Translation Error ◽

Error Classification ◽

Liu E

ABSTRACT: This article aims at proposing an interdisciplinary approach involving the areas of Multimodality and Evaluation of Machine Translation to explore new configurations of text-image semantic relations generated by machine translation results. The methodology consists of a brief contextualization of the research problem, followed by the presentation and study of concepts and possibilities of Multimodality and Evaluation of Machine Translation, with an emphasis on the notion of intersemiotic texture, proposed by Liu and O'Halloran (2009), and a study of machine translation error classification, proposed by Vilar et. al. (2006). Finally, the article suggests some potentialities and limitations when combining the application of both areas of investigation.KEYWORDS: multimodality; machine translation; evaluation of machine translation; intersemiotic mismatches.RESUMO: Este artigo tem como objetivo propor uma abordagem interdisciplinar envolvendo as áreas da multimodalidade e da avaliação de tradução automática para explorar novas configurações de relações semânticas entre texto e imagem geradas por resultados de traduções automáticas. A metodologia é composta de uma breve contextualização sobre o problema de investigação, seguida da apresentação e do estudo de conceitos e possibilidades da multimodalidade e da avaliação de tradução automática, com destaque para os trabalhos respectivamente sobre textura intersemiótica proposta por Liu e O’Halloran (2009) e classificação de erros de máquinas de tradução proposta por Vilar et. al. (2006). Ao final, o estudo sugere algumas potencialidades e limitações no uso conjugado de ambas as áreas.PALAVRAS-CHAVE: multimodalidade; tradução automática; avaliação de tradução automática; incompatibilidades intersemióticas.

Download Full-text

UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study

Applied Sciences ◽

10.3390/app10113904 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3904

Author(s):

Van-Hai Vu ◽

Quang-Phuoc Nguyen ◽

Joon-Choul Shin ◽

Cheol-Young Ock

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Ambiguous Word ◽

High Rate ◽

Word Sense ◽

Language Resources ◽

Parallel Corpora ◽

Knowledge Based ◽

Translation Error ◽

Translation Study

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.

Download Full-text

Automatic machine translation error identification

Machine Translation ◽

10.1007/s10590-014-9163-y ◽

2014 ◽

Vol 29 (1) ◽

pp. 1-24 ◽

Cited By ~ 4

Author(s):

Débora Beatriz de Jesus Martins ◽

Helena de Medeiros Caseli

Keyword(s):

Machine Translation ◽

Automatic Machine ◽

Error Identification ◽

Translation Error

Download Full-text

Error Classification and Analysis for Machine Translation Quality Assessment

Machine Translation: Technologies and Applications - Translation Quality Assessment ◽

10.1007/978-3-319-91241-7_7 ◽

2018 ◽

pp. 129-158

Author(s):

Maja Popović

Keyword(s):

Quality Assessment ◽

Machine Translation ◽

Translation Quality ◽

Error Classification

Download Full-text

How to evaluate machine translation: A review of automated and human metrics

Natural Language Engineering ◽

10.1017/s1351324919000469 ◽

2019 ◽

Vol 26 (2) ◽

pp. 137-161

Author(s):

Eirini Chatzikoumi

Keyword(s):

Machine Translation ◽

Subjective Evaluation ◽

Evaluation Methods ◽

Quality Estimation ◽

Neural Machine Translation ◽

Mt Evaluation ◽

Error Classification ◽

Better Than ◽

Detailed Presentation

AbstractThis article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Download Full-text

A linguistically motivated taxonomy for Machine Translation error analysis

Machine Translation ◽

10.1007/s10590-015-9169-0 ◽

2015 ◽

Vol 29 (2) ◽

pp. 127-161 ◽

Cited By ~ 5

Author(s):

Ângela Costa ◽

Wang Ling ◽

Tiago Luís ◽

Rui Correia ◽

Luísa Coheur

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Translation Error

Download Full-text

An Oblivious Approach to Machine Translation Quality Estimation

Mathematics ◽

10.3390/math9172090 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2090

Author(s):

Itamar Elmakias ◽

Dan Vilenchik

Keyword(s):

Machine Translation ◽

Accurate Method ◽

Target Sentence ◽

Expert Evaluation ◽

Human Judgment ◽

Quality Estimation ◽

Translation Quality ◽

Baseline System ◽

Large Corpus

Machine translation (MT) is being used by millions of people daily, and therefore evaluating the quality of such systems is an important task. While human expert evaluation of MT output remains the most accurate method, it is not scalable by any means. Automatic procedures that perform the task of Machine Translation Quality Estimation (MT-QE) are typically trained on a large corpus of source–target sentence pairs, which are labeled with human judgment scores. Furthermore, the test set is typically drawn from the same distribution as the train. However, recently, interest in low-resource and unsupervised MT-QE has gained momentum. In this paper, we define and study a further restriction of the unsupervised MT-QE setting that we call oblivious MT-QE. Besides having no access no human judgment scores, the algorithm has no access to the test text’s distribution. We propose an oblivious MT-QE system based on a new notion of sentence cohesiveness that we introduce. We tested our system on standard competition datasets for various language pairs. In all cases, the performance of our system was comparable to the performance of the non-oblivious baseline system provided by the competition organizers. Our results suggest that reasonable MT-QE can be carried out even in the restrictive oblivious setting.

Download Full-text

Error Selection Methods for Machine Translation Error Analysis

Journal of Natural Language Processing ◽

10.5715/jnlp.23.87 ◽

2016 ◽

Vol 23 (1) ◽

pp. 87-117

Author(s):

Koichi Akabe ◽

Graham Neubig ◽

Sakriani Sakti ◽

Tomoki Toda ◽

Satoshi Nakamura

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Selection Methods ◽

Translation Error

Download Full-text

Development of English-Hindi Interactive Machine Translation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1532.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 4185-4189

Keyword(s):

Machine Translation ◽

Error Rate ◽

Statistical Machine Translation ◽

Indian Languages ◽

Word Error Rate ◽

Translation Error ◽

Translation Systems

Machine Translation systems are still far from being perfect and to improve their performance the concept of Interactive Machine Translation (IMT) was introduced. This paper proposes an IMT system, which uses Statistical Machine Translation and a bilingual corpus on which several algorithms (Word error rate, Position Independent Error Rate, Translation Error Rate, n-grams) are implemented to translate text from English to Indian languages. The proposed system improves both the speed and productivity of the human translators as found through experiments.

Download Full-text