Statistical Analysis of Machine Translation Evaluation Systems for English- Hindi Language Pair

Background: Automatic Machine Translation (AMT) Evaluation Metrics have become popular in the Machine Translation Community in recent times. This is because of the popularity of Machine Translation engines and Machine Translation as a field itself. Translator is a very important tool to break barriers between communities especially in countries like India, where people speak 22 different languages and their many variations. With the onset of Machine Translation engines, there is a need for a system that evaluates how well these are performing. This is where machine translation evaluation enters. Objective: This paper discusses the importance of Automatic Machine Translation Evaluation and compares various Machine Translation Evaluation metrics by performing Statistical Analysis on various metrics and human evaluations to find out which metric has the highest correlation with human scores. Methods: The correlation between the Automatic and Human Evaluation Scores and the correlation between the five Automatic evaluation scores are examined at the sentence level. Moreover, a hypothesis is set up and p-values are calculated to find out how significant these correlations are. Results: The results of the statistical analysis of the scores of various metrics and human scores are shown in the form of graphs to see the trend of the correlation between the scores of Automatic Machine Translation Evaluation metrics and human scores. Conclusion: Out of the five metrics considered for the study, METEOR shows the highest correlation with human scores as compared to the other metrics.

Download Full-text

Discourse Structure in Machine Translation Evaluation

Computational Linguistics ◽

10.1162/coli_a_00298 ◽

2017 ◽

Vol 43 (4) ◽

pp. 683-722 ◽

Cited By ~ 1

Author(s):

Shafiq Joty ◽

Francisco Guzmán ◽

Lluís Màrquez ◽

Preslav Nakov

Keyword(s):

Machine Translation ◽

Similarity Measures ◽

Discourse Structure ◽

System Level ◽

Structure Theory ◽

Evaluation Metrics ◽

Machine Translation Evaluation ◽

Sentence Level ◽

Relation Type ◽

Parse Trees

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment level and at the system level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTK party. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular, we show that (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference RST tree is positively correlated with translation quality.

Download Full-text

Performance Enhancement of Machine Translation Evaluation Systems for English – Hindi Language Pair

International Journal of Modern Education and Computer Science ◽

10.5815/ijmecs.2019.02.06 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42-49

Author(s):

Pooja Malik ◽

◽

Anurag Singh Baghel

Keyword(s):

Machine Translation ◽

Performance Enhancement ◽

Machine Translation Evaluation ◽

Evaluation Systems ◽

Hindi Language ◽

Language Pair

Download Full-text

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

10.18653/v1/2020.acl-main.448 ◽

2020 ◽

Author(s):

Nitika Mathur ◽

Timothy Baldwin ◽

Trevor Cohn

Keyword(s):

Machine Translation ◽

Automatic Machine ◽

Evaluation Metrics ◽

Machine Translation Evaluation

Download Full-text

Significance tests of automatic machine translation evaluation metrics

Machine Translation ◽

10.1007/s10590-010-9073-6 ◽

2010 ◽

Vol 24 (1) ◽

pp. 51-65 ◽

Cited By ~ 3

Author(s):

Ying Zhang ◽

Stephan Vogel

Keyword(s):

Machine Translation ◽

Automatic Machine ◽

Evaluation Metrics ◽

Significance Tests ◽

Machine Translation Evaluation

Download Full-text

Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0011 ◽

2017 ◽

Vol 108 (1) ◽

pp. 85-96 ◽

Cited By ~ 2

Author(s):

Eva Martínez Garcia ◽

Carles Creus ◽

Cristina España-Bonet ◽

Lluís Màrquez

Keyword(s):

Machine Translation ◽

Evaluation Metrics ◽

Automatic Evaluation ◽

Word Embeddings ◽

Standard Document ◽

Sentence Level ◽

Word Translation ◽

Stochastic Mechanism ◽

Document Level

Abstract We integrate new mechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies to words that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each word translation given its context. Second, we extend the decoder with a new stochastic mechanism that, at translation time, allows to introduce changes in the translation oriented to improve its lexical consistency. We evaluate our system on English–Spanish document translation, and we conduct automatic and manual assessments of its quality. The automatic evaluation metrics, applied mainly at sentence level, do not reflect significant variations. On the contrary, the manual evaluation shows that the system dealing with lexical consistency is preferred over both a standard sentence-level and a standard document-level phrase-based MT systems.

Download Full-text

Regression for machine translation evaluation at the sentence level

Machine Translation ◽

10.1007/s10590-008-9046-1 ◽

2008 ◽

Vol 22 (1-2) ◽

pp. 1-27 ◽

Cited By ~ 7

Author(s):

Joshua S. Albrecht ◽

Rebecca Hwa

Keyword(s):

Machine Translation ◽

Machine Translation Evaluation ◽

Sentence Level

Download Full-text

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

10.18653/v1/2020.acl-main.327 ◽

2020 ◽

Author(s):

Kosuke Takahashi ◽

Katsuhito Sudoh ◽

Satoshi Nakamura

Keyword(s):

Machine Translation ◽

Language Model ◽

Automatic Machine ◽

Machine Translation Evaluation ◽

Source Language ◽

Cross Lingual

Download Full-text

Neutralizing the Effect of Translation Shifts on Automatic Machine Translation Evaluation

Computational Linguistics and Intelligent Text Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-18111-0_45 ◽

2015 ◽

pp. 596-607

Author(s):

Marina Fomicheva ◽

Núria Bel ◽

Iria da Cunha

Keyword(s):

Machine Translation ◽

Automatic Machine ◽

Machine Translation Evaluation

Download Full-text

Automatic Meta-evaluation of Low-Resource Machine Translation Evaluation Metrics

2019 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp48816.2019.9037658 ◽

2019 ◽

Author(s):

Junting Yu ◽

Wuying Liu ◽

Hongye He ◽

Lin Wang

Keyword(s):

Machine Translation ◽

Evaluation Metrics ◽

Machine Translation Evaluation ◽

Low Resource

Download Full-text

COSTA MT Evaluation Tool: An Open Toolkit for Human Machine Translation Evaluation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2013-0014 ◽

2013 ◽

Vol 100 (1) ◽

pp. 83-89 ◽

Cited By ~ 1

Author(s):

Konstantinos Chatzitheodorou

Keyword(s):

Machine Translation ◽

Evaluation Tool ◽

Machine Translation Evaluation ◽

Translation Error ◽

Mt Evaluation ◽

Java Program ◽

Error Classification ◽

The One ◽

Language Pair

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.

Download Full-text