Lexical and Structural Ambiguity in Machine Translation: An Analytical Study

2021 ◽  
Vol 1 (1) ◽  
pp. 59-79
Author(s):  
Yaseen Alzeebaree

The aim of this study is to investigate the difficulties facing Machine Translation (Google) particularly those related to lexis and structure. The researcher has chosen randomly two English and two Arabic texts about various sorts of translation: Media, Scientific, General and Economic. They were taken from several sources (websites, magazine.) to be translated automatically (Google) and humanly from Arabic to English and vice versa. Then they were analyzed to see the challenges that face Machine Translation (Google). The results of the study indicate that MT is problematic and has many challenges concerning lexis such as (Deletions, non-vocalizations, multiple meanings, collocations, additions and acronyms) and syntax like: word order, verb-subject agreement, passive voice etc.  On the basis of the results of the study, the researcher recommended that further work needs to be done to create a system that comprises syntax, morphology and semantics of all languages.

2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


Author(s):  
Riyad Al-Shalabi ◽  
Ghassan Kanaan ◽  
Huda Al-Sarhan ◽  
Alaa Drabsh ◽  
Islam Al-Husban

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one.  Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.


2019 ◽  
Vol 7 ◽  
pp. 661-676 ◽  
Author(s):  
Jiatao Gu ◽  
Qi Liu ◽  
Kyunghyun Cho

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.


2016 ◽  
Vol 9 (3) ◽  
pp. 13 ◽  
Author(s):  
Hadis Ghasemi ◽  
Mahmood Hashemian

<p>Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on Google Translate, few researchers have considered Persian-English translation pairs. This study used Keshavarzʼs (1999) model of error analysis to carry out a comparison study between the raw English-Persian translations and Persian-English translations from Google Translate. Based on the criteria presented in the model, 100 systematically selected sentences from an interpreter app called Motarjem Hamrah were translated by Google Translate and then evaluated and brought in different tables. Results of analyzing and tabulating the frequencies of the errors together with conducting a chi-square test showed no significant differences between the qualities of Google Translate from English to Persian and Persian to English. In addition, lexicosemantic and active/passive voice errors were the most and least frequent errors, respectively. Directions for future research are recognized in the paper for the improvements of the system.</p>


2015 ◽  
Vol 16 (32) ◽  
pp. 43-49
Author(s):  
John F. White

SummaryDevelopment of the machine translator Blitz Latin between the years 2002 and 2015 is discussed. Key issues remain the ambiguity in meaning of Latin stems and inflections, and the word order of the Latin language. Attempts to improve machine translation of Latin are described by the programmer.


2013 ◽  
Vol 411-414 ◽  
pp. 1923-1929
Author(s):  
Ren Fen Hu ◽  
Yun Zhu ◽  
Yao Hong Jin ◽  
Jia Yong Chen

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.


2021 ◽  
Vol 47 (1) ◽  
pp. 9-42
Author(s):  
Miloš Stanojević ◽  
Mark Steedman

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.


2012 ◽  
Vol 7 ◽  
Author(s):  
Kristiina Muhonen ◽  
Tanja Purtonen

In this article, we investigate ambiguity in syntactic annotation. The ambiguity in question is inherent in a way that even human annotators interpret the meaning differently. In our experiment, we detect potential structurally ambiguous sentences with Constraint Grammar rules. In the linguistic phenomena we investigate, structural ambiguity is primarily caused by word order. The potentially ambiguous particle or adverbial is located between the main verb and the (participial) NP. After detecting the structures, we analyze how many of the potentially ambiguous cases are actually ambiguous using the double-blind method. We rank the sentences captured by the rules on a 1 to 5 scale to indicate which reading the annotator regards as the primary one. The results indicate that 67% of the sentences are ambiguous. Introducing ambiguity in the treebank/parsebank increases the informativeness of the representation since both correct analyses are presented.


2021 ◽  
Vol 13 (2) ◽  
pp. 271
Author(s):  
Nadia Khumairo Ma'shumah ◽  
Isra F. Sianipar ◽  
Cynthia Yanda Salsabila

A scant number of Google Translate users and researchers continue to be skeptical of the current Google Translate's performance as a machine translation tool. As English passive voice translation often brings problems, especially when translated into Indonesian which rich of affixes, this study works to analyze the way Google Translate (MT) translates English passive voice into Indonesian and to investigate whether Google Translate (MT) can do modulation. The data in this research were in the form of clauses and sentences with passive voice taken from corpus data. It included 497 news articles from the online news platform ‘GlobalVoices,' which were processed with AntConc 3.5.8 software. The data in this research were analyzed quantitatively and qualitatively to achieve broad objectives, depth of understanding, and the corroboration. Meanwhile, the comparative methods were used to analyze both source and target texts. Through the cautious process of collecting and analyzing the data, the results showed that (1) GT (via NMT) was able to translate the English passive voice by distinguishing morphological changes in Indonesian passive voice (2) GT was able to modulate English passive voice into Indonesian base verbs and Indonesian active voice.


2021 ◽  
Vol 11 (16) ◽  
pp. 7662
Author(s):  
Yong-Seok Choi ◽  
Yo-Han Park ◽  
Seung Yun ◽  
Sang-Hun Kim ◽  
Kong-Joo Lee

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.


Sign in / Sign up

Export Citation Format

Share Document