A simple automatic MT evaluation metric

Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist.21(4) (2017) 717–724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111–3119], authors have trained a large corpus of pre-processed Hindi text ([Formula: see text] million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github. a

Download Full-text

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11397-5_6 ◽

2014 ◽

pp. 82-93

Author(s):

Chi-kiu Lo ◽

Dekai Wu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric ◽

Cross Lingual

Download Full-text

Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation

The Scientific World JOURNAL ◽

10.1155/2014/760301 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Aaron L.-F. Han ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Liangye He ◽

Yi Lu

Keyword(s):

Rapid Development ◽

Linguistic Features ◽

Estimation Model ◽

Linguistic Feature ◽

Language Bias ◽

Automatic Translation ◽

Part Of Speech ◽

Mt Evaluation ◽

Evaluation Metric ◽

Translation Systems

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.

Download Full-text

ENTF: An Entropy-Based MT Evaluation Metric

Communications in Computer and Information Science - Machine Translation ◽

10.1007/978-981-10-7134-8_7 ◽

2017 ◽

pp. 68-77

Author(s):

Hui Yu ◽

Weizhi Xu ◽

Shouxun Lin ◽

Qun Liu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

10.20944/preprints202201.0018.v1 ◽

2022 ◽

Author(s):

Ahrii Kim ◽

Jinhyun Kim

Keyword(s):

Empirical Study ◽

Automatic Evaluation ◽

Human Judgment ◽

Evaluation Data ◽

Human Evaluation ◽

Mt Evaluation ◽

Evaluation Metric ◽

Agglutinative Languages

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Download Full-text

Using Concept Mapping for Theory and Evaluation Metric Development

PsycEXTRA Dataset ◽

10.1037/e620792013-001 ◽

2013 ◽

Author(s):

Amanda J. Visek

Keyword(s):

Concept Mapping ◽

Evaluation Metric

Download Full-text

Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

10.21236/ada456393 ◽

2005 ◽

Cited By ~ 5

Author(s):

Keith J. Miller ◽

Michelle Vanni

Keyword(s):

Rater Agreement ◽

Mt Evaluation

Download Full-text

Mining the plasma-proteome associated genes in patients with gastro-esophageal cancers for biomarker discovery

Scientific Reports ◽

10.1038/s41598-021-87037-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Frederick S. Vizeacoumar ◽

Hongyu Guo ◽

Lynn Dwernychuk ◽

Adnan Zaidi ◽

Andrew Freywald ◽

...

Keyword(s):

Biomarker Discovery ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Cancer Recurrence ◽

The Cancer Genome Atlas ◽

Svm Classifier ◽

Plasma Proteome ◽

Cancer Genome Atlas ◽

Novel Biomarkers ◽

Evaluation Metric

AbstractGastro-esophageal (GE) cancers are one of the major causes of cancer-related death in the world. There is a need for novel biomarkers in the management of GE cancers, to yield predictive response to the available therapies. Our study aims to identify leading genes that are differentially regulated in patients with these cancers. We explored the expression data for those genes whose protein products can be detected in the plasma using the Cancer Genome Atlas to identify leading genes that are differentially regulated in patients with GE cancers. Our work predicted several candidates as potential biomarkers for distinct stages of GE cancers, including previously identified CST1, INHBA, STMN1, whose expression correlated with cancer recurrence, or resistance to adjuvant therapies or surgery. To define the predictive accuracy of these genes as possible biomarkers, we constructed a co-expression network and performed complex network analysis to measure the importance of the genes in terms of a ratio of closeness centrality (RCC). Furthermore, to measure the significance of these differentially regulated genes, we constructed an SVM classifier using machine learning approach and verified these genes by using receiver operator characteristic (ROC) curve as an evaluation metric. The area under the curve measure was > 0.9 for both the overexpressed and downregulated genes suggesting the potential use and reliability of these candidates as biomarkers. In summary, we identified leading differentially expressed genes in GE cancers that can be detected in the plasma proteome. These genes have potential to become diagnostic and therapeutic biomarkers for early detection of cancer, recurrence following surgery and for development of targeted treatment.

Download Full-text

A simple automatic MT evaluation metric

Insight into Multiple References in an MT Evaluation Metric

MinKSR: A Novel MT Evaluation Metric for Coordinating Human Translators with the CAT-Oriented Input Method

Machine Translation Evaluation: Unveiling the Role of Dense Sentence Vector Embedding for Morphologically Rich Language

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation

ENTF: An Entropy-Based MT Evaluation Metric

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

Using Concept Mapping for Theory and Evaluation Metric Development

Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Mining the plasma-proteome associated genes in patients with gastro-esophageal cancers for biomarker discovery

Export Citation Format