scholarly journals Measuring Machine Translation Errors in New Domains

2013 ◽  
Vol 1 ◽  
pp. 429-440 ◽  
Author(s):  
Ann Irvine ◽  
John Morgan ◽  
Marine Carpuat ◽  
Hal Daumé ◽  
Dragos Munteanu

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

2020 ◽  
Vol 44 (1) ◽  
pp. 33-50
Author(s):  
Ivan Dunđer

Machine translation is increasingly becoming a hot research topic in information and communication sciences, computer science and computational linguistics, due to the fact that it enables communication and transferring of meaning across different languages. As the Croatian language can be considered low-resourced in terms of available services and technology, development of new domain-specific machine translation systems is important, especially due to raised interest and needs of industry, academia and everyday users. Machine translation is not perfect, but it is crucial to assure acceptable quality, which is purpose-dependent. In this research, different statistical machine translation systems were built – but one system utilized domain adaptation in particular, with the intention of boosting the output of machine translation. Afterwards, extensive evaluation has been performed – in form of applying several automatic quality metrics and human evaluation with focus on various aspects. Evaluation is done in order to assess the quality of specific machine-translated text.


2017 ◽  
Vol 5 ◽  
pp. 487-500
Author(s):  
Benjamin Marie ◽  
Atsushi Fujita

We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English–French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.


2020 ◽  
Vol 10 (4) ◽  
pp. 408
Author(s):  
Noureldin Mohamed Abdelaal ◽  
Abdulkhaliq Alazzawie

This study aims at identifying the common types of errors in Google Translate (GT) in the translation of informative news texts from Arabic to English, to measure the translation errors quality and to assess the fluency and the semantic adequacy of the translation output, and therefore to explain the extent a human translator is needed to rectify the output translation. For this purpose, some examples were purposively selected from online newspapers. The collected data was analyzed using a mixed method approach, as the errors were qualitatively identified, guided by Hsu’s (2014) classification of machine translation errors. Quantitative descriptive approach was used to measure the translation errors quality, using the Multidimensional Quality Metrics and Localization Quality Evaluation. As for assessing the semantic adequacy and fluency, a questionnaire that was adapted from Dorr, Snover, and Madnani (2011) was used. The results of the analysis show that omission, which is a lexical error and inappropriate lexical choice, which is a semantic error are the most common errors. Inappropriate lexical choice is sometimes a result of the homophonic nature of some source text words which can be misinterpreted by the machine translation system. This study concludes that it is useful to use machine translation systems to expedite the translation process, but that accuracy is sacrificed for the sake of ease (less work for the human) and speed of translation. If greater accuracy is required, or desired, a human translator must at least proofread and work on the material.


2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


Sign in / Sign up

Export Citation Format

Share Document