EXPERT EVALUATION OF MACHINE TRANSLATION: ERROR CLASSIFICATION

2013 ◽  
Vol 100 (1) ◽  
pp. 83-89 ◽  
Author(s):  
Konstantinos Chatzitheodorou

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.


2018 ◽  
Vol 11 (1) ◽  
pp. 82-102
Author(s):  
Thiago Blanch Pires

ABSTRACT: This article aims at proposing an interdisciplinary approach involving the areas of Multimodality and Evaluation of Machine Translation to explore new configurations of text-image semantic relations generated by machine translation results. The methodology consists of a brief contextualization of the research problem, followed by the presentation and study of concepts and possibilities of Multimodality and Evaluation of Machine Translation, with an emphasis on the notion of intersemiotic texture, proposed by Liu and O'Halloran (2009), and a study of machine translation error classification, proposed by Vilar et. al. (2006). Finally, the article suggests some potentialities and limitations when combining the application of both areas of investigation.KEYWORDS: multimodality; machine translation; evaluation of machine translation; intersemiotic mismatches.RESUMO: Este artigo tem como objetivo propor uma abordagem interdisciplinar envolvendo as áreas da multimodalidade e da avaliação de tradução automática para explorar novas configurações de relações semânticas entre texto e imagem geradas por resultados de traduções automáticas. A metodologia é composta de uma breve contextualização sobre o problema de investigação, seguida da apresentação e do estudo de conceitos e possibilidades da multimodalidade e da avaliação de tradução automática, com destaque para os trabalhos respectivamente sobre textura intersemiótica proposta por Liu e O’Halloran (2009) e classificação de erros de máquinas de tradução proposta por Vilar et. al. (2006). Ao final, o estudo sugere algumas potencialidades e limitações no uso conjugado de ambas as áreas.PALAVRAS-CHAVE: multimodalidade; tradução automática; avaliação de tradução automática; incompatibilidades intersemióticas.


2020 ◽  
Vol 10 (11) ◽  
pp. 3904
Author(s):  
Van-Hai Vu ◽  
Quang-Phuoc Nguyen ◽  
Joon-Choul Shin ◽  
Cheol-Young Ock

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.


2014 ◽  
Vol 29 (1) ◽  
pp. 1-24 ◽  
Author(s):  
Débora Beatriz de Jesus Martins ◽  
Helena de Medeiros Caseli

2019 ◽  
Vol 26 (2) ◽  
pp. 137-161
Author(s):  
Eirini Chatzikoumi

AbstractThis article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.


2015 ◽  
Vol 29 (2) ◽  
pp. 127-161 ◽  
Author(s):  
Ângela Costa ◽  
Wang Ling ◽  
Tiago Luís ◽  
Rui Correia ◽  
Luísa Coheur

Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2090
Author(s):  
Itamar Elmakias ◽  
Dan Vilenchik

Machine translation (MT) is being used by millions of people daily, and therefore evaluating the quality of such systems is an important task. While human expert evaluation of MT output remains the most accurate method, it is not scalable by any means. Automatic procedures that perform the task of Machine Translation Quality Estimation (MT-QE) are typically trained on a large corpus of source–target sentence pairs, which are labeled with human judgment scores. Furthermore, the test set is typically drawn from the same distribution as the train. However, recently, interest in low-resource and unsupervised MT-QE has gained momentum. In this paper, we define and study a further restriction of the unsupervised MT-QE setting that we call oblivious MT-QE. Besides having no access no human judgment scores, the algorithm has no access to the test text’s distribution. We propose an oblivious MT-QE system based on a new notion of sentence cohesiveness that we introduce. We tested our system on standard competition datasets for various language pairs. In all cases, the performance of our system was comparable to the performance of the non-oblivious baseline system provided by the competition organizers. Our results suggest that reasonable MT-QE can be carried out even in the restrictive oblivious setting.


2016 ◽  
Vol 23 (1) ◽  
pp. 87-117
Author(s):  
Koichi Akabe ◽  
Graham Neubig ◽  
Sakriani Sakti ◽  
Tomoki Toda ◽  
Satoshi Nakamura

Machine Translation systems are still far from being perfect and to improve their performance the concept of Interactive Machine Translation (IMT) was introduced. This paper proposes an IMT system, which uses Statistical Machine Translation and a bilingual corpus on which several algorithms (Word error rate, Position Independent Error Rate, Translation Error Rate, n-grams) are implemented to translate text from English to Indian languages. The proposed system improves both the speed and productivity of the human translators as found through experiments.


Sign in / Sign up

Export Citation Format

Share Document