Lexical and Structural Ambiguity in Machine Translation: An Analytical Study

Yaseen Alzeebaree

doi:10.53906/ejlll.v1i1.3

Lexical and Structural Ambiguity in Machine Translation: An Analytical Study

Eastern Journal of Languages, Linguistics and Literatures ◽

10.53906/ejlll.v1i1.3 ◽

2021 ◽

Vol 1 (1) ◽

pp. 59-79

Author(s):

Yaseen Alzeebaree

Keyword(s):

Machine Translation ◽

Word Order ◽

Analytical Study ◽

Passive Voice ◽

Structural Ambiguity ◽

Multiple Meanings ◽

Subject Agreement

The aim of this study is to investigate the difficulties facing Machine Translation (Google) particularly those related to lexis and structure. The researcher has chosen randomly two English and two Arabic texts about various sorts of translation: Media, Scientific, General and Economic. They were taken from several sources (websites, magazine.) to be translated automatically (Google) and humanly from Arabic to English and vice versa. Then they were analyzed to see the challenges that face Machine Translation (Google). The results of the study indicate that MT is problematic and has many challenges concerning lexis such as (Deletions, non-vocalizations, multiple meanings, collocations, additions and acronyms) and syntax like: word order, verb-subject agreement, passive voice etc. On the basis of the results of the study, the researcher recommended that further work needs to be done to create a system that comprises syntax, morphology and semantics of all languages.

Download Full-text

Word-Order Issues in English-to-Urdu Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0007-0 ◽

2011 ◽

Vol 95 (1) ◽

pp. 87-106 ◽

Cited By ~ 3

Author(s):

Bushra Jawaid ◽

Daniel Zeman

Keyword(s):

Machine Translation ◽

Word Order ◽

Statistical Machine Translation ◽

Parse Tree ◽

Hard Problem ◽

Long Distance ◽

Translation Process ◽

English Sentence ◽

European Languages

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

Evaluating Machine Translations from Arabic into English and Vice Versa

International Research Journal of Electronics and Computer Engineering ◽

10.24178/irjece.2017.3.2.01 ◽

2017 ◽

Vol 3 (2) ◽

pp. 1

Author(s):

Riyad Al-Shalabi ◽

Ghassan Kanaan ◽

Huda Al-Sarhan ◽

Alaa Drabsh ◽

Islam Al-Husban

Keyword(s):

Machine Translation ◽

Word Order ◽

Solution Structure ◽

Sentence Length ◽

Third Party ◽

Abstract Machine ◽

Translation Tools ◽

Index Terms ◽

Word Selection ◽

Accuracy Degree

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one. Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.

Download Full-text

Insertion-based Decoding with Automatically Inferred Generation Order

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00292 ◽

2019 ◽

Vol 7 ◽

pp. 661-676 ◽

Cited By ~ 3

Author(s):

Jiatao Gu ◽

Qi Liu ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Real World ◽

Word Order ◽

Code Generation ◽

State Of The Art ◽

Generation Model ◽

Beam Search ◽

Input Information ◽

Sequence Generation ◽

Image Caption

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Download Full-text

A Comparative Study of Google Translate Translations: An Error Analysis of English-to-Persian and Persian-to-English Translations

English Language Teaching ◽

10.5539/elt.v9n3p13 ◽

2016 ◽

Vol 9 (3) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Hadis Ghasemi ◽

Mahmood Hashemian

Keyword(s):

Error Analysis ◽

Machine Translation ◽

English Translation ◽

Statistical Machine Translation ◽

Future Research ◽

Comparison Study ◽

Chi Square ◽

Passive Voice ◽

English Translations ◽

Chi Square Test

<p>Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on Google Translate, few researchers have considered Persian-English translation pairs. This study used Keshavarzʼs (1999) model of error analysis to carry out a comparison study between the raw English-Persian translations and Persian-English translations from Google Translate. Based on the criteria presented in the model, 100 systematically selected sentences from an interpreter app called Motarjem Hamrah were translated by Google Translate and then evaluated and brought in different tables. Results of analyzing and tabulating the frequencies of the errors together with conducting a chi-square test showed no significant differences between the qualities of Google Translate from English to Persian and Persian to English. In addition, lexicosemantic and active/passive voice errors were the most and least frequent errors, respectively. Directions for future research are recognized in the paper for the improvements of the system.</p>

Download Full-text

Blitz Latin Revisited

Journal of Classics Teaching ◽

10.1017/s2058631015000203 ◽

2015 ◽

Vol 16 (32) ◽

pp. 43-49

Author(s):

John F. White

Keyword(s):

Machine Translation ◽

Word Order ◽

Latin Language ◽

Key Issues

SummaryDevelopment of the machine translator Blitz Latin between the years 2002 and 2015 is discussed. Key issues remain the ambiguity in meaning of Latin stems and inflections, and the word order of the Latin language. Attempts to improve machine translation of Latin are described by the programmer.

Download Full-text

A Syntactic Reordering Model of Chinese Special Sentences for Patent Machine Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1923 ◽

2013 ◽

Vol 411-414 ◽

pp. 1923-1929

Author(s):

Ren Fen Hu ◽

Yun Zhu ◽

Yao Hong Jin ◽

Jia Yong Chen

Keyword(s):

Machine Translation ◽

Statistical Methods ◽

Word Order ◽

English Word ◽

Long Distance ◽

Rule Based

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.

Download Full-text

Formal Basis of a Language Universal

Computational Linguistics ◽

10.1162/coli_a_00394 ◽

2021 ◽

Vol 47 (1) ◽

pp. 9-42

Author(s):

Miloš Stanojević ◽

Mark Steedman

Keyword(s):

Present Article ◽

Machine Translation ◽

Word Order ◽

Practical Interest ◽

Categorial Grammars ◽

Computer Scientists ◽

Formal Basis ◽

Crosslinguistic Variation ◽

Language Universal ◽

Separable Permutations

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.

Download Full-text

Detecting Semantic Ambiguity

Linguistic Issues in Language Technology ◽

10.33011/lilt.v7i.1291 ◽

2012 ◽

Vol 7 ◽

Author(s):

Kristiina Muhonen ◽

Tanja Purtonen

Keyword(s):

Word Order ◽

Semantic Ambiguity ◽

Double Blind ◽

Structural Ambiguity ◽

Grammar Rules ◽

Main Verb ◽

Double Blind Method ◽

Syntactic Annotation

In this article, we investigate ambiguity in syntactic annotation. The ambiguity in question is inherent in a way that even human annotators interpret the meaning differently. In our experiment, we detect potential structurally ambiguous sentences with Constraint Grammar rules. In the linguistic phenomena we investigate, structural ambiguity is primarily caused by word order. The potentially ambiguous particle or adverbial is located between the main verb and the (participial) NP. After detecting the structures, we analyze how many of the potentially ambiguous cases are actually ambiguous using the double-blind method. We rank the sentences captured by the rules on a 1 to 5 scale to indicate which reading the annotator regards as the primary one. The results indicate that 67% of the sentences are ambiguous. Introducing ambiguity in the treebank/parsebank increases the informativeness of the representation since both correct analyses are presented.

Download Full-text

Google Translate Performance in Translating English Passive Voice into Indonesian

PIONEER: Journal of Language and Literature ◽

10.36841/pioneer.v13i2.1292 ◽

2021 ◽

Vol 13 (2) ◽

pp. 271

Author(s):

Nadia Khumairo Ma'shumah ◽

Isra F. Sianipar ◽

Cynthia Yanda Salsabila

Keyword(s):

Machine Translation ◽

Morphological Changes ◽

Online News ◽

Comparative Methods ◽

Passive Voice ◽

Active Voice ◽

Corpus Data ◽

The Way

A scant number of Google Translate users and researchers continue to be skeptical of the current Google Translate's performance as a machine translation tool. As English passive voice translation often brings problems, especially when translated into Indonesian which rich of affixes, this study works to analyze the way Google Translate (MT) translates English passive voice into Indonesian and to investigate whether Google Translate (MT) can do modulation. The data in this research were in the form of clauses and sentences with passive voice taken from corpus data. It included 497 news articles from the online news platform ‘GlobalVoices,' which were processed with AntConc 3.5.8 software. The data in this research were analyzed quantitatively and qualitatively to achieve broad objectives, depth of understanding, and the corroboration. Meanwhile, the comparative methods were used to analyze both source and target texts. Through the cautious process of collecting and analyzing the data, the results showed that (1) GT (via NMT) was able to translate the English passive voice by distinguishing morphological changes in Indonesian passive voice (2) GT was able to modulate English passive voice into Indonesian base verbs and Indonesian active voice.

Download Full-text

Factors Behind the Effectiveness of an Unsupervised Neural Machine Translation System between Korean and Japanese

Applied Sciences ◽

10.3390/app11167662 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7662

Author(s):

Yong-Seok Choi ◽

Yo-Han Park ◽

Seung Yun ◽

Sang-Hun Kim ◽

Kong-Joo Lee

Keyword(s):

Machine Translation ◽

Word Order ◽

Target Language ◽

Translation System ◽

Generation Model ◽

Neural Machine Translation ◽

Language Differences ◽

Machine Translation System ◽

The Common ◽

Morphological System

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.

Download Full-text