A comprehensive understanding of popular machine translation evaluation metrics

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment level and at the system level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTK party. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular, we show that (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference RST tree is positively correlated with translation quality.

Download Full-text

Automatic Meta-evaluation of Low-Resource Machine Translation Evaluation Metrics

2019 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp48816.2019.9037658 ◽

2019 ◽

Author(s):

Junting Yu ◽

Wuying Liu ◽

Hongye He ◽

Lin Wang

Keyword(s):

Machine Translation ◽

Evaluation Metrics ◽

Machine Translation Evaluation ◽

Low Resource

Download Full-text

Statistical Analysis of Machine Translation Evaluation Systems for English- Hindi Language Pair

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190716100145 ◽

2020 ◽

Vol 13 (5) ◽

pp. 864-870

Author(s):

Pooja Malik ◽

Y. Mrudula ◽

Anurag S. Baghel

Keyword(s):

Statistical Analysis ◽

Machine Translation ◽

Automatic Machine ◽

Evaluation Metrics ◽

Machine Translation Evaluation ◽

Evaluation Systems ◽

Evaluation Scores ◽

Sentence Level ◽

Set Up ◽

Language Pair

Background: Automatic Machine Translation (AMT) Evaluation Metrics have become popular in the Machine Translation Community in recent times. This is because of the popularity of Machine Translation engines and Machine Translation as a field itself. Translator is a very important tool to break barriers between communities especially in countries like India, where people speak 22 different languages and their many variations. With the onset of Machine Translation engines, there is a need for a system that evaluates how well these are performing. This is where machine translation evaluation enters. Objective: This paper discusses the importance of Automatic Machine Translation Evaluation and compares various Machine Translation Evaluation metrics by performing Statistical Analysis on various metrics and human evaluations to find out which metric has the highest correlation with human scores. Methods: The correlation between the Automatic and Human Evaluation Scores and the correlation between the five Automatic evaluation scores are examined at the sentence level. Moreover, a hypothesis is set up and p-values are calculated to find out how significant these correlations are. Results: The results of the statistical analysis of the scores of various metrics and human scores are shown in the form of graphs to see the trend of the correlation between the scores of Automatic Machine Translation Evaluation metrics and human scores. Conclusion: Out of the five metrics considered for the study, METEOR shows the highest correlation with human scores as compared to the other metrics.

Download Full-text