MEANT : a highly accurate semantic frame based evaluation metric for improving machine translation utility

Author(s):  
Chi-Kiu Lo
2019 ◽  
Vol 27 (10) ◽  
pp. 1497-1506 ◽  
Author(s):  
Pairui Li ◽  
Chuan Chen ◽  
Wujie Zheng ◽  
Yuetang Deng ◽  
Fanghua Ye ◽  
...  

2015 ◽  
Vol 104 (1) ◽  
pp. 17-26
Author(s):  
Miloš Stanojević ◽  
Khalil Sima’an

Abstract We present BEER, an open source implementation of a machine translation evaluation metric. BEER is a metric trained for high correlation with human ranking by using learning-to-rank training methods. For evaluation of lexical accuracy it uses sub-word units (character n-grams) while for measuring word order it uses hierarchical representations based on PETs (permutation trees). During the last WMT metrics tasks, BEER has shown high correlation with human judgments both on the sentence and the corpus levels. In this paper we will show how BEER can be used for (i) full evaluation of MT output, (ii) isolated evaluation of word order and (iii) tuning MT systems.


2009 ◽  
Vol 91 (1) ◽  
pp. 79-88 ◽  
Author(s):  
Omar Zaidan

Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems We introduce Z-MERT, a software tool for minimum error rate training of machine translation systems (Och, 2003). In addition to being an open source tool that is extremely easy to compile and run, Z-MERT is also agnostic regarding the evaluation metric, fully configurable, and requires no modification to work with any decoder. We describe Z-MERT and review its features, and report the results of a series of experiments that examine the tool's runtime. We establish that Z-MERT is extremely efficient, making it well-suited for time-sensitive pipelines. The experiments also provide an insight into the tool's runtime in terms of several variables (size of the development set, size of produced N-best lists, etc).


Author(s):  
Samiksha Tripathi ◽  
Vineet Kansal

Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist.21(4) (2017) 717–724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111–3119], authors have trained a large corpus of pre-processed Hindi text ([Formula: see text] million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github. a


Author(s):  
Oliver Czulo ◽  
◽  
Tiago Timponi Torrent ◽  
Ely Edison da Silva Matos ◽  
Alexandre Diniz da Costa ◽  
...  

2019 ◽  
Vol 12 (2) ◽  
pp. 134-158 ◽  
Author(s):  
Achraf Othman ◽  
Mohamed Jemni

In this article, the authors deal with the machine translation of written English text to sign language. They study the existing systems and issues in order to propose an implantation of a statistical machine translation from written English text to American Sign Language (English/ASL) taking care of several features of sign language. The work proposes a novel approach to build artificial corpus using grammatical dependencies rules owing to the lack of resources for sign language. The parallel corpus was the input of the statistical machine translation, which was used for creating statistical memory translation based on IBM alignment algorithms. These algorithms were enhanced and optimized by integrating the Jaro–Winkler distances in order to decrease training process. Subsequently, based on the constructed translation memory, a decoder was implemented for translating English text to the ASL using a novel proposed transcription system based on gloss annotation. The results were evaluated using the BLEU evaluation metric.


2018 ◽  
Vol 35 (3) ◽  
pp. 575-599 ◽  
Author(s):  
Jon Sprouse ◽  
Beracah Yankama ◽  
Sagar Indurkhya ◽  
Sandiway Fong ◽  
Robert C. Berwick

Abstract In their recent paper, Lau, Clark, and Lappin explore the idea that the probability of the occurrence of word strings can form the basis of an adequate theory of grammar (Lau, Jey H., Alexander Clark & 15 Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A prob- abilistic view of linguistic knowledge. Cognitive Science 41(5):1201–1241). To make their case, they present the results of correlating the output of several probabilistic models trained solely on naturally occurring sentences with the gradient acceptability judgments that humans report for ungrammatical sentences derived from roundtrip machine translation errors. In this paper, we first explore the logic of the Lau et al. argument, both in terms of the choice of evaluation metric (gradient acceptability), and in the choice of test data set (machine translation errors on random sentences from a corpus). We then present our own series of studies intended to allow for a better comparison between LCL’s models and existing grammatical theories. We evaluate two of LCL’s probabilistic models (trigrams and recurrent neural network) against three data sets (taken from journal articles, a textbook, and Chomsky’s famous colorless-green-ideas sentence), using three evaluation metrics (LCL’s gradience metric, a categorical version of the metric, and the experimental-logic metric used in the syntax literature). Our results suggest there are very real, measurable cost-benefit tradeoffs inherent in LCL’s models across the three evaluation metrics. The gain in explanation of gradience (between 13% and 31% of gradience) is offset by losses in the other two metrics: a 43%-49% loss in coverage based on a categorical metric of explaining acceptability, and a loss of 12%-35% in explaining experimentally-defined phenomena. This suggests that anyone wishing to pursue LCL’s models as competitors with existing syntactic theories must either be satisfied with this tradeoff, or modify the models to capture the phenomena that are not currently captured.


Sign in / Sign up

Export Citation Format

Share Document