Morpho-syntactic information for automatic error analysis of statistical machine translation output

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.

Download Full-text

A Comparative Study of Google Translate Translations: An Error Analysis of English-to-Persian and Persian-to-English Translations

English Language Teaching ◽

10.5539/elt.v9n3p13 ◽

2016 ◽

Vol 9 (3) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Hadis Ghasemi ◽

Mahmood Hashemian

Keyword(s):

Error Analysis ◽

Machine Translation ◽

English Translation ◽

Statistical Machine Translation ◽

Future Research ◽

Comparison Study ◽

Chi Square ◽

Passive Voice ◽

English Translations ◽

Chi Square Test

<p>Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on Google Translate, few researchers have considered Persian-English translation pairs. This study used Keshavarzʼs (1999) model of error analysis to carry out a comparison study between the raw English-Persian translations and Persian-English translations from Google Translate. Based on the criteria presented in the model, 100 systematically selected sentences from an interpreter app called Motarjem Hamrah were translated by Google Translate and then evaluated and brought in different tables. Results of analyzing and tabulating the frequencies of the errors together with conducting a chi-square test showed no significant differences between the qualities of Google Translate from English to Persian and Persian to English. In addition, lexicosemantic and active/passive voice errors were the most and least frequent errors, respectively. Directions for future research are recognized in the paper for the improvements of the system.</p>

Download Full-text

Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Language Resources and Evaluation ◽

10.1007/s10579-011-9137-0 ◽

2011 ◽

Vol 45 (2) ◽

pp. 181-208 ◽

Cited By ~ 8

Author(s):

Mireia Farrús ◽

Marta R. Costa-jussà ◽

José B. Mariño ◽

Marc Poch ◽

Adolfo Hernández ◽

...

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Spanish Language ◽

Language Pair

Download Full-text

Reordering model using syntactic information of a source tree for statistical machine translation

Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation - SSST '09 ◽

10.3115/1626344.1626353 ◽

2009 ◽

Cited By ~ 1

Author(s):

Kei Hashimoto ◽

Hirohumi Yamamoto ◽

Hideo Okuma ◽

Eiichiro Sumita ◽

Keiichi Tokuda

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Source Tree ◽

Syntactic Information

Download Full-text

Czech Machine Translation in the project CzechMate

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0005 ◽

2014 ◽

Vol 101 (1) ◽

pp. 71-96 ◽

Cited By ~ 1

Author(s):

Ondřej Bojar ◽

Daniel Zeman

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Individual Source ◽

The Individual

Abstract We present various achievements in statistical machine translation from English, German, Spanish and French into Czech. We discuss specific properties of the individual source languages and describe techniques that exploit these properties and address language-specific errors. Besides the translation proper, we also present our contribution to error analysis.

Download Full-text

Developing Statistical Machine Translation System for English and Nigerian Languages

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2018/v1i424761 ◽

2018 ◽

pp. 1-8

Author(s):

Ignatius Ikechukwu Ayogu ◽

Adebayo Olusola Adetunmbi ◽

Bolanle Adefowoke Ojokoh

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpora ◽

Translation Tools ◽

Global Demand ◽

Machine Translation System ◽

Translation Systems

The global demand for translation and translation tools currently surpasses the capacity of available solutions. Besides, there is no one-solution-fits-all, off-the-shelf solution for all languages. Thus, the need and urgency to increase the scale of research for the development of translation tools and devices continue to grow, especially for languages suffering under the pressure of globalisation. This paper discusses our experiments on translation systems between English and two Nigerian languages: Igbo and Yorùbá. The study is setup to build parallel corpora, train and experiment English-to-Igbo, (), English-to-Yorùbá, () and Igbo-to-Yorùbá, () phrase-based statistical machine translation systems. The systems were trained on parallel corpora that were created for each language pair using text from the religious domain in the course of this research. A BLEU score of 30.04, 29.01 and 18.72 respectively was recorded for the English-to-Igbo, English-to-Yorùbá and Igbo-to-Yorùbá MT systems. An error analysis of the systems’ outputs was conducted using a linguistically motivated MT error analysis approach and it showed that errors occurred mostly at the lexical, grammatical and semantic levels. While the study reveals the potentials of our corpora, it also shows that the size of the corpora is yet an issue that requires further attention. Thus an important target in the immediate future is to increase the quantity and quality of the data.

Download Full-text

Syntactic Pattern Based Word Alignment for Statistical Machine Translation

International Journal of Knowledge and Systems Science ◽

10.4018/ijkss.2014070103 ◽

2014 ◽

Vol 5 (3) ◽

pp. 36-45

Author(s):

Quang-Hung LE ◽

Anh-Cuong LE

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Research Community ◽

Word Alignment ◽

Syntactic Pattern ◽

Syntactic Information ◽

New Type ◽

Grammatical Structures ◽

Language Pair

Word alignment is the task of aligning bilingual words in a corpus of parallel sentences, and determining the probabilities for these aligned bilingual word pairs. It is the most important factor affecting the quality of any Statistical Machine Translation (SMT) systems. The IBM word alignment models are most well-known in the SMT research community. These models are pure statistical models and therefore they are not good for some language pairs which have differences in linguistic aspects (e.g. grammatical structures). This paper aims to improve the IBM models by using syntactic information. The authors first propose a new type of constraint based on bilingual syntactic patterns, and then integrate it into the IBM models. Finally, they show how to estimate the models' parameters using this new type of constraint. The experiments are conducted on the English-Vietnamese language pair for evaluation.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text