mt evaluation Latest Research Papers

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

10.20944/preprints202201.0018.v1 ◽

2022 ◽

Author(s):

Ahrii Kim ◽

Jinhyun Kim

Keyword(s):

Empirical Study ◽

Automatic Evaluation ◽

Human Judgment ◽

Evaluation Data ◽

Human Evaluation ◽

Mt Evaluation ◽

Evaluation Metric ◽

Agglutinative Languages

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Download Full-text

MT Evaluation in the Context of Language Complexity

Complexity ◽

10.1155/2021/2806108 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Dasa Munkova ◽

Michal Munk ◽

Ľubomír Benko ◽

Jiri Stastny

Keyword(s):

Point Of View ◽

Sentence Structure ◽

Random Errors ◽

Translation Process ◽

Second Person ◽

Complexity Measures ◽

Language Complexity ◽

Human Agent ◽

Mt Evaluation ◽

The Impact

The paper focuses on investigating the impact of artificial agent (machine translator) on human agent (posteditor) using a proposed methodology, which is based on language complexity measures, POS tags, frequent tagsets, association rules, and their summarization. We examine this impact from the point of view of language complexity in terms of word and sentence structure. By the proposed methodology, we analyzed 24 733 tags of English to Slovak translations of technical texts, corresponding to the output of two MT systems (Google Translate and the European Commission’s MT tool). We used both manual (adequacy and fluency) and semiautomatic (HTER metric) MT evaluation measures as the criteria for validity. We show that the proposed methodology is valid based on the evaluation of frequent tagsets and rules of MT outputs produced by Google Translate or of the European Commission’s MT tool, and both postedited MT (PEMT) outputs using baseline methods. Our results have also shown that PEMT output produced by Google Translate is characterized by more frequent tagsets such as verbs in the infinitive with modal verbs compared to its MT output, which is characterized by masculine, inanimate nouns in locative of singular. In the MT output, produced by the European Commission’s MT tool, the most frequent tagset was verbs in the infinitive compared to its postedited MT output, where verbs in imperative and the second person of plural occurred. These findings are also obtained from the use of the proposed methodology for MT evaluation. The contribution of the proposed methodology is an identification of systematic not random errors. Additionally, the study can also serve as information for optimizing the translation process using postediting.

Download Full-text

Evaluating RBMT output for -ing forms: A study of four tar-get languages

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v8i.247 ◽

2021 ◽

Vol 8 ◽

Author(s):

Nora Aranberri-Monasterio ◽

Sharon O‘Brien

Keyword(s):

Target Language ◽

Evaluation Metrics ◽

Source Text ◽

Human Evaluation ◽

Mt Evaluation ◽

Rule Sets ◽

Target Languages ◽

Controlled Language

-ing forms in English are reported to be problematic for Machine Transla-tion and are often the focus of rules in Controlled Language rule sets. We investigated how problematic -ing forms are for an RBMT system, translat-ing into four target languages in the IT domain. Constituent-based human evaluation was used and the results showed that, in general, -ing forms do not deserve their bad reputation. A comparison with the results of five automated MT evaluation metrics showed promising correlations. Some issues prevail, however, and can vary from target language to target lan-guage. We propose different strategies for dealing with these problems, such as Controlled Language rules, semi-automatic post-editing, source text tagging and “post-editing” the source text.

Download Full-text

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v8i.246 ◽

2021 ◽

Vol 8 ◽

Author(s):

Bogdan Babych ◽

Anthony Hartley

Keyword(s):

State Of The Art ◽

Automatic Detection ◽

Parallel Corpora ◽

Multiword Expressions ◽

Translation Quality ◽

Automated Evaluation ◽

Development Cycle ◽

Mt Evaluation ◽

Speed Up ◽

Translation Errors

We describe the results of a research project aimed at automatic detection of MT errors using state-of-the-art MT evaluation metrics, such as BLEU. Currently, these automated metrics give only a general indication of translation quality at the corpus level and cannot be used directly for identifying gaps in the coverage of MT systems. Our methodology uses automatic detection of frequent multiword expressions (MWEs) in sentence-aligned parallel corpora and computes an automated evaluation score for concordances generated for such MWEs which indicates whether a particular expression is systematically mistranslated in the corpus. The method can be applied both to source and target MWEs to indicate, respectively, whether MT can successfully deal with source expressions, or whether certain frequent target expressions can be successfully generated. The results can be useful for systematically checking the coverage of MT systems in order to speed up the development cycle of rule-based MT. This approach can also enhance current techniques for finding translation equivalents by distributional similarity and for automatically identifying features of MT-tractable language.

Download Full-text

The FEMTI guidelines for contextual MT evaluation: principles and resources

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v8i.244 ◽

2021 ◽

Vol 8 ◽

Author(s):

Paula Estrella ◽

Andrei Popescu-Belis ◽

Maghi King

Keyword(s):

Machine Translation ◽

Quality Characteristics ◽

Evaluation Metrics ◽

Design Evaluation ◽

Software Evaluation ◽

Web Based ◽

Mt Evaluation ◽

Context Of Use ◽

Contextual Evaluation ◽

Selection Of

A large number of evaluation metrics exist for machine translation (MT) systems, but depending on the intended context of use of such a system, not all metrics are equally relevant. Based on the ISO/IEC 9126 and 14598 standards for software evaluation, the Framework for the Evaluation of Machine Translation in ISLE (FEMTI) provides guidelines for the selection of quality characteristics to be evaluated depending on the expected task, users, and input characteristics of an MT system. This approach to contextual evaluation was implemented as a web-based application which helps its users design evaluation plans. In addition, FEMTI offers experts in evaluation the possibility to enter and share their knowledge using a dedicated web-based tool, tested in several evaluation exercises.

Download Full-text

A Critique of Statistical Machine Translation

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v8i.243 ◽

2021 ◽

Vol 8 ◽

Author(s):

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Models ◽

Prior Experience ◽

Statistical Machine Translation ◽

Rule Based ◽

Basic Model ◽

Mt Evaluation ◽

The Relationship ◽

Do So

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators and, somewhat more surprisingly perhaps, many experienced MT protagonists find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We demonstrate that this was not always the case; on the contrary, when statistical models of trans-lation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years, things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we discuss three additional issues: the role of automatic MT evaluation metrics when describing PB-SMT systems; the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.

Download Full-text

The Quality of Machine Translation Assessment On Gender Markers Lingual Units

Lensa Kajian Kebahasaan Kesusastraan dan Budaya ◽

10.26714/lensa.11.1.2021.54-67 ◽

2021 ◽

Vol 11 (1) ◽

pp. 54

Author(s):

Hapni Nurliana H.D Hasibuan

Keyword(s):

Machine Translation ◽

Short Story ◽

Rating Scales ◽

Dominant Type ◽

Research Fields ◽

Mt Evaluation ◽

Research Findings ◽

The Subject ◽

Readability Level

Machine Translation (MT) is one of the most advanced and elaborate research fields within Translation Technology, the quality of MT output has always been a great concern, and MT evaluation is a popular research topic. This research aims to assess the quality translation on the gender markers lingual unit of the Arabic short story "عَبْدُ اللهِ وَالْعُصْفُوْرُ" which is translated to English and Indonesian using Machine Translation. The research was qualitatively-based method. The subject is gender markers lingual unit that taken from the Arabic short story "عَبْدُ اللهِ وَالْعُصْفُوْرُ". The key instrument of this research is human instrument. Additional instruments used to support this research consisted tables of the lingual units of gender markers and table of rating scales based on Nababans' theory (2012). The research findings showed that this analysis has discovered were 72 lingual units of gender markers in the short story. Further, the dominant type was personal pronoun. Based on the results, it can be concluded that the google translate translation on the gender markers lingual units have the high quality on the accuracy, acceptance, and readability level.

Download Full-text

Abstract P887: Lack of Racial, Ethnic, and Sex Disparities in Ischemic Stroke Care Metrics Within a Telestroke Network

Stroke ◽

10.1161/str.52.suppl_1.p887 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Sujan T Reddy ◽

Tzu-Ching Wu ◽

Jing Zhang ◽

Mohammad H Rahbar ◽

Christy Ankrom ◽

...

Keyword(s):

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Missing Values ◽

Stroke Care ◽

Vessel Occlusion ◽

Mt Evaluation ◽

Sex Disparities ◽

Get With The Guidelines ◽

The Impact ◽

Racial Ethnic

Introduction: Little is known on the impact of telestroke in addressing disparities in acute ischemic stroke care. Methods: We conducted a retrospective review of acute ischemic stroke patients evaluated over our 17-hospital telestroke network in Texas from 2015-2018. Patients were described as Non-Hispanic White (NHW) male or female, Non-Hispanic Black (NHB) male or female, or Hispanic (HIS) male or female. Single imputation using fully conditional specification was conducted to impute missing values in NIHSS (N=103). We compared frequency of tPA and mechanical thrombectomy (MT) utilization, door-to-consultation times, door-to-tPA times, and time-to-transfer for patients who went on to MT evaluation at the hub after having been screened for suspected large vessel occlusion at the spoke. Results: Among 3873 patients (including 1146 NHW male (30%) and 1134 NHW female (29%), 405 NHB male (10%) and 491 NHB female (13%), and 358 HIS male (9%) and 339 HIS female (9%) patients) (Table 1), we did not find any differences in door-to consultation time, door-to-tPA time, time-to-transfer, frequency of tPA administration or incidence of MT utilization (Table 1 & 2). Conclusion: There was a lack of racial, ethnic, and sex disparities in ischemic stroke care metrics within our telestroke network. In order to fully understand how telestroke alleviates disparities in stroke care beyond our single-network review, collaboration among networks is needed to formulate a multicenter telestroke database similar to the Get-With-The Guidelines.

Download Full-text

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

Applied Sciences ◽

10.3390/app11020639 ◽

2021 ◽

Vol 11 (2) ◽

pp. 639

Author(s):

Despoina Mouratidis ◽

Katia Lida Kermanidis ◽

Vilelmini Sosoni

Keyword(s):

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Noisy Data ◽

Integrated Analysis ◽

Linguistic Features ◽

Linguistic Differences ◽

Mt Evaluation ◽

Morphologically Rich Languages ◽

The Impact

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena.

Download Full-text

On Post-Editability of Machine Translated Texts

Translation Today ◽

10.46623/tt/2021.15.1.ar4 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Ch Ram Anirudh ◽

Kavi Narayana Murthy

Keyword(s):

Machine Translation ◽

Quantitative Method ◽

Evaluation Metrics ◽

Indian Language ◽

Actual Time ◽

Mt Evaluation ◽

Statistical Mt ◽

Modern Machine

Machine Translated texts are often far from perfect and postediting is essential to get publishable quality. Post-editing may not always be a pleasant task. However, modern machine translation (MT) approaches like Statistical MT (SMT) and Neural MT (NMT) seem to hold greater promise. In this work, we present a quantitative method for scoring translations and computing the post-editability of MT system outputs.We show that the scores we get correlate well with MT evaluation metrics as also with the actual time and effort required for post-editing. We compare the outputs of three modern MT systems namely phrase-based SMT (PBMT), NMT, and Google translate for their Post-Editability for English to Hindi translation. Further, we explore the effect of various kinds of errors in MT outputs on postediting time and effort. Including an Indian language in this kind of post-editability study and analyzing the influence oferrors on postediting time and effort for NMT are highlights of this work.

Download Full-text

mt evaluation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

MT Evaluation in the Context of Language Complexity

Evaluating RBMT output for -ing forms: A study of four tar-get languages

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

The FEMTI guidelines for contextual MT evaluation: principles and resources

A Critique of Statistical Machine Translation

The Quality of Machine Translation Assessment On Gender Markers Lingual Units

Abstract P887: Lack of Racial, Ethnic, and Sex Disparities in Ischemic Stroke Care Metrics Within a Telestroke Network

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

On Post-Editability of Machine Translated Texts

Export Citation Format

mt evaluationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

MT Evaluation in the Context of Language Complexity

Evaluating RBMT output for -ing forms: A study of four tar-get languages

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

The FEMTI guidelines for contextual MT evaluation: principles and resources

A Critique of Statistical Machine Translation

The Quality of Machine Translation Assessment On Gender Markers Lingual Units

Abstract P887: Lack of Racial, Ethnic, and Sex Disparities in Ischemic Stroke Care Metrics Within a Telestroke Network

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

On Post-Editability of Machine Translated Texts

mt evaluation
Recently Published Documents