mt evaluation
Recently Published Documents


TOTAL DOCUMENTS

66
(FIVE YEARS 17)

H-INDEX

7
(FIVE YEARS 1)

Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Dasa Munkova ◽  
Michal Munk ◽  
Ľubomír Benko ◽  
Jiri Stastny

The paper focuses on investigating the impact of artificial agent (machine translator) on human agent (posteditor) using a proposed methodology, which is based on language complexity measures, POS tags, frequent tagsets, association rules, and their summarization. We examine this impact from the point of view of language complexity in terms of word and sentence structure. By the proposed methodology, we analyzed 24 733 tags of English to Slovak translations of technical texts, corresponding to the output of two MT systems (Google Translate and the European Commission’s MT tool). We used both manual (adequacy and fluency) and semiautomatic (HTER metric) MT evaluation measures as the criteria for validity. We show that the proposed methodology is valid based on the evaluation of frequent tagsets and rules of MT outputs produced by Google Translate or of the European Commission’s MT tool, and both postedited MT (PEMT) outputs using baseline methods. Our results have also shown that PEMT output produced by Google Translate is characterized by more frequent tagsets such as verbs in the infinitive with modal verbs compared to its MT output, which is characterized by masculine, inanimate nouns in locative of singular. In the MT output, produced by the European Commission’s MT tool, the most frequent tagset was verbs in the infinitive compared to its postedited MT output, where verbs in imperative and the second person of plural occurred. These findings are also obtained from the use of the proposed methodology for MT evaluation. The contribution of the proposed methodology is an identification of systematic not random errors. Additionally, the study can also serve as information for optimizing the translation process using postediting.


Author(s):  
Nora Aranberri-Monasterio ◽  
Sharon O‘Brien

-ing forms in English are reported to be problematic for Machine Transla-tion and are often the focus of rules in Controlled Language rule sets. We investigated how problematic -ing forms are for an RBMT system, translat-ing into four target languages in the IT domain. Constituent-based human evaluation was used and the results showed that, in general, -ing forms do not deserve their bad reputation. A comparison with the results of five automated MT evaluation metrics showed promising correlations. Some issues prevail, however, and can vary from target language to target lan-guage. We propose different strategies for dealing with these problems, such as Controlled Language rules, semi-automatic post-editing, source text tagging and “post-editing” the source text.


Author(s):  
Bogdan Babych ◽  
Anthony Hartley

We describe the results of a research project aimed at automatic detection of MT errors using state-of-the-art MT evaluation metrics, such as BLEU. Currently, these automated metrics give only a general indication of translation quality at the corpus level and cannot be used directly for identifying gaps in the coverage of MT systems. Our methodology uses automatic detection of frequent multiword expressions (MWEs) in sentence-aligned parallel corpora and computes an automated evaluation score for concordances generated for such MWEs which indicates whether a particular expression is systematically mistranslated in the corpus. The method can be applied both to source and target MWEs to indicate, respectively, whether MT can successfully deal with source expressions, or whether certain frequent target expressions can be successfully generated. The results can be useful for systematically checking the coverage of MT systems in order to speed up the development cycle of rule-based MT. This approach can also enhance current techniques for finding translation equivalents by distributional similarity and for automatically identifying features of MT-tractable language.


Author(s):  
Paula Estrella ◽  
Andrei Popescu-Belis ◽  
Maghi King

A large number of evaluation metrics exist for machine translation (MT) systems, but depending on the intended context of use of such a system, not all metrics are equally relevant. Based on the ISO/IEC 9126 and 14598 standards for software evaluation, the Framework for the Evaluation of Machine Translation in ISLE (FEMTI) provides guidelines for the selection of quality characteristics to be evaluated depending on the expected task, users, and input characteristics of an MT system. This approach to contextual evaluation was implemented as a web-based application which helps its users design evaluation plans. In addition, FEMTI offers experts in evaluation the possibility to enter and share their knowledge using a dedicated web-based tool, tested in several evaluation exercises.


Author(s):  
Andy Way

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators and, somewhat more surprisingly perhaps, many experienced MT protagonists find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We demonstrate that this was not always the case; on the contrary, when statistical models of trans-lation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years, things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we discuss three additional issues: the role of automatic MT evaluation metrics when describing PB-SMT systems; the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.


2021 ◽  
Vol 11 (1) ◽  
pp. 54
Author(s):  
Hapni Nurliana H.D Hasibuan

Machine Translation (MT) is one of the most advanced and elaborate research fields within Translation Technology, the quality of MT output has always been a great concern, and MT evaluation is a popular research topic. This research aims to assess the quality translation on the gender markers lingual unit of the Arabic short story "عَبْدُ اللهِ وَالْعُصْفُوْرُ" which is translated to English and Indonesian using Machine Translation. The research was qualitatively-based method. The subject is gender markers lingual unit that taken from the Arabic short story "عَبْدُ اللهِ وَالْعُصْفُوْرُ".  The key instrument of this research is human instrument. Additional instruments used to support this research consisted tables of the lingual units of gender markers and table of rating scales based on Nababans' theory (2012). The research findings showed that this analysis has discovered were 72 lingual units of gender markers in the short story.  Further, the dominant type was personal pronoun. Based on the results, it can be concluded that the google translate translation on the gender markers lingual units have the high quality on the accuracy, acceptance, and readability level.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Sujan T Reddy ◽  
Tzu-Ching Wu ◽  
Jing Zhang ◽  
Mohammad H Rahbar ◽  
Christy Ankrom ◽  
...  

Introduction: Little is known on the impact of telestroke in addressing disparities in acute ischemic stroke care. Methods: We conducted a retrospective review of acute ischemic stroke patients evaluated over our 17-hospital telestroke network in Texas from 2015-2018. Patients were described as Non-Hispanic White (NHW) male or female, Non-Hispanic Black (NHB) male or female, or Hispanic (HIS) male or female. Single imputation using fully conditional specification was conducted to impute missing values in NIHSS (N=103). We compared frequency of tPA and mechanical thrombectomy (MT) utilization, door-to-consultation times, door-to-tPA times, and time-to-transfer for patients who went on to MT evaluation at the hub after having been screened for suspected large vessel occlusion at the spoke. Results: Among 3873 patients (including 1146 NHW male (30%) and 1134 NHW female (29%), 405 NHB male (10%) and 491 NHB female (13%), and 358 HIS male (9%) and 339 HIS female (9%) patients) (Table 1), we did not find any differences in door-to consultation time, door-to-tPA time, time-to-transfer, frequency of tPA administration or incidence of MT utilization (Table 1 & 2). Conclusion: There was a lack of racial, ethnic, and sex disparities in ischemic stroke care metrics within our telestroke network. In order to fully understand how telestroke alleviates disparities in stroke care beyond our single-network review, collaboration among networks is needed to formulate a multicenter telestroke database similar to the Get-With-The Guidelines.


2021 ◽  
Vol 11 (2) ◽  
pp. 639
Author(s):  
Despoina Mouratidis ◽  
Katia Lida Kermanidis ◽  
Vilelmini Sosoni

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Ch Ram Anirudh ◽  
Kavi Narayana Murthy

Machine Translated texts are often far from perfect and postediting is essential to get publishable quality. Post-editing may not always be a pleasant task. However, modern machine translation (MT) approaches like Statistical MT (SMT) and Neural MT (NMT) seem to hold greater promise. In this work, we present a quantitative method for scoring translations and computing the post-editability of MT system outputs.We show that the scores we get correlate well with MT evaluation metrics as also with the actual time and effort required for post-editing. We compare the outputs of three modern MT systems namely phrase-based SMT (PBMT), NMT, and Google translate for their Post-Editability for English to Hindi translation. Further, we explore the effect of various kinds of errors in MT outputs on postediting time and effort. Including an Indian language in this kind of post-editability study and analyzing the influence oferrors on postediting time and effort for NMT are highlights of this work.


Sign in / Sign up

Export Citation Format

Share Document