scholarly journals A simple automatic MT evaluation metric

Author(s):  
Petr Homola ◽  
Vladislav Kuboň ◽  
Pavel Pecina
Author(s):  
Samiksha Tripathi ◽  
Vineet Kansal

Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist.21(4) (2017) 717–724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111–3119], authors have trained a large corpus of pre-processed Hindi text ([Formula: see text] million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github. a


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Aaron L.-F. Han ◽  
Derek F. Wong ◽  
Lidia S. Chao ◽  
Liangye He ◽  
Yi Lu

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.


Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Frederick S. Vizeacoumar ◽  
Hongyu Guo ◽  
Lynn Dwernychuk ◽  
Adnan Zaidi ◽  
Andrew Freywald ◽  
...  

AbstractGastro-esophageal (GE) cancers are one of the major causes of cancer-related death in the world. There is a need for novel biomarkers in the management of GE cancers, to yield predictive response to the available therapies. Our study aims to identify leading genes that are differentially regulated in patients with these cancers. We explored the expression data for those genes whose protein products can be detected in the plasma using the Cancer Genome Atlas to identify leading genes that are differentially regulated in patients with GE cancers. Our work predicted several candidates as potential biomarkers for distinct stages of GE cancers, including previously identified CST1, INHBA, STMN1, whose expression correlated with cancer recurrence, or resistance to adjuvant therapies or surgery. To define the predictive accuracy of these genes as possible biomarkers, we constructed a co-expression network and performed complex network analysis to measure the importance of the genes in terms of a ratio of closeness centrality (RCC). Furthermore, to measure the significance of these differentially regulated genes, we constructed an SVM classifier using machine learning approach and verified these genes by using receiver operator characteristic (ROC) curve as an evaluation metric. The area under the curve measure was > 0.9 for both the overexpressed and downregulated genes suggesting the potential use and reliability of these candidates as biomarkers. In summary, we identified leading differentially expressed genes in GE cancers that can be detected in the plasma proteome. These genes have potential to become diagnostic and therapeutic biomarkers for early detection of cancer, recurrence following surgery and for development of targeted treatment.


Sign in / Sign up

Export Citation Format

Share Document