scholarly journals Morpho-syntactic information for automatic error analysis of statistical machine translation output

Author(s):  
Maja Popović ◽  
Hermann Ney ◽  
Adrià de Gispert ◽  
José B. Mariño ◽  
Deepa Gupta ◽  
...  
2011 ◽  
Vol 37 (4) ◽  
pp. 657-688 ◽  
Author(s):  
Maja Popović ◽  
Hermann Ney

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.


2016 ◽  
Vol 9 (3) ◽  
pp. 13 ◽  
Author(s):  
Hadis Ghasemi ◽  
Mahmood Hashemian

<p>Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on Google Translate, few researchers have considered Persian-English translation pairs. This study used Keshavarzʼs (1999) model of error analysis to carry out a comparison study between the raw English-Persian translations and Persian-English translations from Google Translate. Based on the criteria presented in the model, 100 systematically selected sentences from an interpreter app called Motarjem Hamrah were translated by Google Translate and then evaluated and brought in different tables. Results of analyzing and tabulating the frequencies of the errors together with conducting a chi-square test showed no significant differences between the qualities of Google Translate from English to Persian and Persian to English. In addition, lexicosemantic and active/passive voice errors were the most and least frequent errors, respectively. Directions for future research are recognized in the paper for the improvements of the system.</p>


2011 ◽  
Vol 45 (2) ◽  
pp. 181-208 ◽  
Author(s):  
Mireia Farrús ◽  
Marta R. Costa-jussà ◽  
José B. Mariño ◽  
Marc Poch ◽  
Adolfo Hernández ◽  
...  

2014 ◽  
Vol 101 (1) ◽  
pp. 71-96 ◽  
Author(s):  
Ondřej Bojar ◽  
Daniel Zeman

Abstract We present various achievements in statistical machine translation from English, German, Spanish and French into Czech. We discuss specific properties of the individual source languages and describe techniques that exploit these properties and address language-specific errors. Besides the translation proper, we also present our contribution to error analysis.


Author(s):  
Ignatius Ikechukwu Ayogu ◽  
Adebayo Olusola Adetunmbi ◽  
Bolanle Adefowoke Ojokoh

The global demand for translation and translation tools currently surpasses the capacity of available solutions. Besides, there is no one-solution-fits-all, off-the-shelf solution for all languages. Thus, the need and urgency to increase the scale of research for the development of translation tools and devices continue to grow, especially for languages suffering under the pressure of globalisation. This paper discusses our experiments on translation systems between English and two Nigerian languages: Igbo and Yorùbá. The study is setup to build parallel corpora, train and experiment English-to-Igbo, (), English-to-Yorùbá, () and Igbo-to-Yorùbá, () phrase-based statistical machine translation systems. The systems were trained on parallel corpora that were created for each language pair using text from the religious domain in the course of this research. A BLEU score of 30.04, 29.01 and 18.72 respectively was recorded for the English-to-Igbo, English-to-Yorùbá and Igbo-to-Yorùbá MT systems. An error analysis of the systems’ outputs was conducted using a linguistically motivated MT error analysis approach and it showed that errors occurred mostly at the lexical, grammatical and semantic levels. While the study reveals the potentials of our corpora, it also shows that the size of the corpora is yet an issue that requires further attention. Thus an important target in the immediate future is to increase the quantity and quality of the data.  


2014 ◽  
Vol 5 (3) ◽  
pp. 36-45
Author(s):  
Quang-Hung LE ◽  
Anh-Cuong LE

Word alignment is the task of aligning bilingual words in a corpus of parallel sentences, and determining the probabilities for these aligned bilingual word pairs. It is the most important factor affecting the quality of any Statistical Machine Translation (SMT) systems. The IBM word alignment models are most well-known in the SMT research community. These models are pure statistical models and therefore they are not good for some language pairs which have differences in linguistic aspects (e.g. grammatical structures). This paper aims to improve the IBM models by using syntactic information. The authors first propose a new type of constraint based on bilingual syntactic patterns, and then integrate it into the IBM models. Finally, they show how to estimate the models' parameters using this new type of constraint. The experiments are conducted on the English-Vietnamese language pair for evaluation.


2018 ◽  
Vol 5 (1) ◽  
pp. 37-45
Author(s):  
Darryl Yunus Sulistyan

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.


Sign in / Sign up

Export Citation Format

Share Document