MinKSR: A Novel MT Evaluation Metric for Coordinating Human Translators with the CAT-Oriented Input Method

Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist.21(4) (2017) 717–724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111–3119], authors have trained a large corpus of pre-processed Hindi text ([Formula: see text] million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github. a

Download Full-text

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11397-5_6 ◽

2014 ◽

pp. 82-93

Author(s):

Chi-kiu Lo ◽

Dekai Wu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric ◽

Cross Lingual

Download Full-text

A simple automatic MT evaluation metric

10.3115/1626431.1626436 ◽

2009 ◽

Cited By ~ 1

Author(s):

Petr Homola ◽

Vladislav Kuboň ◽

Pavel Pecina

Keyword(s):

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation

The Scientific World JOURNAL ◽

10.1155/2014/760301 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Aaron L.-F. Han ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Liangye He ◽

Yi Lu

Keyword(s):

Rapid Development ◽

Linguistic Features ◽

Estimation Model ◽

Linguistic Feature ◽

Language Bias ◽

Automatic Translation ◽

Part Of Speech ◽

Mt Evaluation ◽

Evaluation Metric ◽

Translation Systems

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.

Download Full-text

ENTF: An Entropy-Based MT Evaluation Metric

Communications in Computer and Information Science - Machine Translation ◽

10.1007/978-981-10-7134-8_7 ◽

2017 ◽

pp. 68-77

Author(s):

Hui Yu ◽

Weizhi Xu ◽

Shouxun Lin ◽

Qun Liu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

10.20944/preprints202201.0018.v1 ◽

2022 ◽

Author(s):

Ahrii Kim ◽

Jinhyun Kim

Keyword(s):

Empirical Study ◽

Automatic Evaluation ◽

Human Judgment ◽

Evaluation Data ◽

Human Evaluation ◽

Mt Evaluation ◽

Evaluation Metric ◽

Agglutinative Languages

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Download Full-text