Blitz Latin Revisited

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

Evaluating Machine Translations from Arabic into English and Vice Versa

International Research Journal of Electronics and Computer Engineering ◽

10.24178/irjece.2017.3.2.01 ◽

2017 ◽

Vol 3 (2) ◽

pp. 1

Author(s):

Riyad Al-Shalabi ◽

Ghassan Kanaan ◽

Huda Al-Sarhan ◽

Alaa Drabsh ◽

Islam Al-Husban

Keyword(s):

Machine Translation ◽

Word Order ◽

Solution Structure ◽

Sentence Length ◽

Third Party ◽

Abstract Machine ◽

Translation Tools ◽

Index Terms ◽

Word Selection ◽

Accuracy Degree

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one. Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.

Download Full-text

Insertion-based Decoding with Automatically Inferred Generation Order

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00292 ◽

2019 ◽

Vol 7 ◽

pp. 661-676 ◽

Cited By ~ 3

Author(s):

Jiatao Gu ◽

Qi Liu ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Real World ◽

Word Order ◽

Code Generation ◽

State Of The Art ◽

Generation Model ◽

Beam Search ◽

Input Information ◽

Sequence Generation ◽

Image Caption

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Download Full-text

An Instance of Constructing Conceptual Model of Statistical Machine Translation for Processing Natural Language

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3645 ◽

2014 ◽

Vol 926-930 ◽

pp. 3645-3648

Author(s):

Yan Di

Keyword(s):

Natural Language ◽

Conceptual Model ◽

Machine Translation ◽

Similarity Measure ◽

Statistical Machine Translation ◽

Basic Frame ◽

Key Issues ◽

Input Sentence ◽

Bilingual Alignment

The basic frame of Example based machine translation is concerned in this paper. Some key issues, such as bilingual alignment, similarity measure between input sentence and example, and template acquisition, are introduced.

Download Full-text

A Syntactic Reordering Model of Chinese Special Sentences for Patent Machine Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1923 ◽

2013 ◽

Vol 411-414 ◽

pp. 1923-1929

Author(s):

Ren Fen Hu ◽

Yun Zhu ◽

Yao Hong Jin ◽

Jia Yong Chen

Keyword(s):

Machine Translation ◽

Statistical Methods ◽

Word Order ◽

English Word ◽

Long Distance ◽

Rule Based

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.

Download Full-text

Formal Basis of a Language Universal

Computational Linguistics ◽

10.1162/coli_a_00394 ◽

2021 ◽

Vol 47 (1) ◽

pp. 9-42

Author(s):

Miloš Stanojević ◽

Mark Steedman

Keyword(s):

Present Article ◽

Machine Translation ◽

Word Order ◽

Practical Interest ◽

Categorial Grammars ◽

Computer Scientists ◽

Formal Basis ◽

Crosslinguistic Variation ◽

Language Universal ◽

Separable Permutations

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.

Download Full-text

Factors Behind the Effectiveness of an Unsupervised Neural Machine Translation System between Korean and Japanese

Applied Sciences ◽

10.3390/app11167662 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7662

Author(s):

Yong-Seok Choi ◽

Yo-Han Park ◽

Seung Yun ◽

Sang-Hun Kim ◽

Kong-Joo Lee

Keyword(s):

Machine Translation ◽

Word Order ◽

Target Language ◽

Translation System ◽

Generation Model ◽

Neural Machine Translation ◽

Language Differences ◽

Machine Translation System ◽

The Common ◽

Morphological System

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.

Download Full-text

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics ◽

10.1162/0891201042544884 ◽

2004 ◽

Vol 30 (4) ◽

pp. 417-449 ◽

Cited By ~ 212

Author(s):

Franz Josef Och ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Word Order ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Linear Modeling ◽

Translation Model ◽

Log Linear ◽

Translation Systems ◽

English Canadian

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

Download Full-text

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Journal of Artificial Intelligence Research ◽

10.1613/jair.3540 ◽

2012 ◽

Vol 44 ◽

pp. 179-222 ◽

Cited By ~ 11

Author(s):

P. Nakov ◽

H. T. Ng

Keyword(s):

Machine Translation ◽

Word Order ◽

Additional Data ◽

Statistical Machine Translation ◽

Training Data ◽

Source Language ◽

Resource Poor ◽

Word Alignments ◽

Spelling Word

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.

Download Full-text

Sentence-structure errors of machine translation into Slovak

Topics in Linguistics ◽

10.2478/topling-2021-0006 ◽

2021 ◽

Vol 22 (1) ◽

pp. 78-92

Author(s):

Katarína Welnitzová ◽

Daša Munková

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Word Order ◽

Academic Environment ◽

Sentence Structure ◽

Research Results ◽

Interdisciplinary Character ◽

Slovak Language

Abstract The study identifies, classifies and analyses errors in machine translation (MT) outputs of journalistic texts from English into Slovak, using error analysis. The research results presented in the study are pioneering, since the issue of machine translation – with its strong interdisciplinary character and novelty – has not yet been studied in the Slovak academic environment. The evaluation of the errors is based on a framework for classification of MT errors devised by Vaňko, which was arranged for the Slovak language. The study discusses and explains the issues of sentence structure, including predicativeness, syntactic-semantic correlativeness, and a modal and communication sentence framework. We discovered that the majority of the errors are related to the categories of agreement, word order and nominal morpho-syntax. This fact clearly correlates with features of journalistic texts, in which nominal structures and nouns in all realizations are used to a great extent. Moreover, there are some serious differences between the languages which limit and affect the quality of translation.

Download Full-text