Measuring Machine Translation Errors in New Domains

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

Download Full-text

Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning ◽

10.26615/978-954-452-049-6_085 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nasredine Semmar ◽

◽

Meriama Laib

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Translation System ◽

Multiword Expressions ◽

Machine Translation System

Download Full-text

Machine Translation System for the Industry Domain and Croatian Language

Journal of information and organizational sciences ◽

10.31341/jios.44.1.2 ◽

2020 ◽

Vol 44 (1) ◽

pp. 33-50

Author(s):

Ivan Dunđer

Keyword(s):

Machine Translation ◽

Computational Linguistics ◽

Technology Development ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Translation System ◽

Domain Specific ◽

Extensive Evaluation ◽

Machine Translation System ◽

Translation Systems

Machine translation is increasingly becoming a hot research topic in information and communication sciences, computer science and computational linguistics, due to the fact that it enables communication and transferring of meaning across different languages. As the Croatian language can be considered low-resourced in terms of available services and technology, development of new domain-specific machine translation systems is important, especially due to raised interest and needs of industry, academia and everyday users. Machine translation is not perfect, but it is crucial to assure acceptable quality, which is purpose-dependent. In this research, different statistical machine translation systems were built – but one system utilized domain adaptation in particular, with the intention of boosting the output of machine translation. Afterwards, extensive evaluation has been performed – in form of applying several automatic quality metrics and human evaluation with focus on various aspects. Evaluation is done in order to assess the quality of specific machine-translated text.

Download Full-text

Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00075 ◽

2017 ◽

Vol 5 ◽

pp. 487-500

Author(s):

Benjamin Marie ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Cartesian Product ◽

Translation System ◽

General Domain ◽

Parallel Data ◽

Machine Translation System ◽

Baseline System ◽

Target Languages

We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English–French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.

Download Full-text

Machine Translation: The Case of Arabic- English Translation of News Texts

Theory and Practice in Language Studies ◽

10.17507/tpls.1004.09 ◽

2020 ◽

Vol 10 (4) ◽

pp. 408

Author(s):

Noureldin Mohamed Abdelaal ◽

Abdulkhaliq Alazzawie

Keyword(s):

Machine Translation ◽

Translation System ◽

Translation Process ◽

Online Newspapers ◽

Lexical Choice ◽

Machine Translation System ◽

Quantitative Descriptive ◽

Translation Errors ◽

Translation Systems

This study aims at identifying the common types of errors in Google Translate (GT) in the translation of informative news texts from Arabic to English, to measure the translation errors quality and to assess the fluency and the semantic adequacy of the translation output, and therefore to explain the extent a human translator is needed to rectify the output translation. For this purpose, some examples were purposively selected from online newspapers. The collected data was analyzed using a mixed method approach, as the errors were qualitatively identified, guided by Hsu’s (2014) classification of machine translation errors. Quantitative descriptive approach was used to measure the translation errors quality, using the Multidimensional Quality Metrics and Localization Quality Evaluation. As for assessing the semantic adequacy and fluency, a questionnaire that was adapted from Dorr, Snover, and Madnani (2011) was used. The results of the analysis show that omission, which is a lexical error and inappropriate lexical choice, which is a semantic error are the most common errors. Inappropriate lexical choice is sometimes a result of the homophonic nature of some source text words which can be misinterpreted by the machine translation system. This study concludes that it is useful to use machine translation systems to expedite the translation process, but that accuracy is sacrificed for the sake of ease (less work for the human) and speed of translation. If greater accuracy is required, or desired, a human translator must at least proofread and work on the material.

Download Full-text

An English-Japanese machine translation system based on formal semantics of natural language

10.3115/991813.991857 ◽

1982 ◽

Cited By ~ 2

Author(s):

Toyo-aki Nishida ◽

Shuji Doshita

Keyword(s):

Natural Language ◽

Machine Translation ◽

Formal Semantics ◽

Translation System ◽

Machine Translation System

Download Full-text

English-Korean Machine Translation System with the Improved Ability to Resolve Linguistic Differences by Pre- and Post-Processing

The Journal of Linguistics Science ◽

10.21296/jls.2020.3.92.151 ◽

2020 ◽

Vol 92 ◽

pp. 151-179

Author(s):

Sung-Dong Kim ◽

Seok Kee Lee

Keyword(s):

Machine Translation ◽

Translation System ◽

Post Processing ◽

Linguistic Differences ◽

Machine Translation System

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text