scholarly journals Blitz Latin Revisited

2015 ◽  
Vol 16 (32) ◽  
pp. 43-49
Author(s):  
John F. White

SummaryDevelopment of the machine translator Blitz Latin between the years 2002 and 2015 is discussed. Key issues remain the ambiguity in meaning of Latin stems and inflections, and the word order of the Latin language. Attempts to improve machine translation of Latin are described by the programmer.

2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


Author(s):  
Riyad Al-Shalabi ◽  
Ghassan Kanaan ◽  
Huda Al-Sarhan ◽  
Alaa Drabsh ◽  
Islam Al-Husban

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one.  Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.


2019 ◽  
Vol 7 ◽  
pp. 661-676 ◽  
Author(s):  
Jiatao Gu ◽  
Qi Liu ◽  
Kyunghyun Cho

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.


2014 ◽  
Vol 926-930 ◽  
pp. 3645-3648
Author(s):  
Yan Di

The basic frame of Example based machine translation is concerned in this paper. Some key issues, such as bilingual alignment, similarity measure between input sentence and example, and template acquisition, are introduced.


2013 ◽  
Vol 411-414 ◽  
pp. 1923-1929
Author(s):  
Ren Fen Hu ◽  
Yun Zhu ◽  
Yao Hong Jin ◽  
Jia Yong Chen

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.


2021 ◽  
Vol 47 (1) ◽  
pp. 9-42
Author(s):  
Miloš Stanojević ◽  
Mark Steedman

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.


2021 ◽  
Vol 11 (16) ◽  
pp. 7662
Author(s):  
Yong-Seok Choi ◽  
Yo-Han Park ◽  
Seung Yun ◽  
Sang-Hun Kim ◽  
Kong-Joo Lee

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.


2004 ◽  
Vol 30 (4) ◽  
pp. 417-449 ◽  
Author(s):  
Franz Josef Och ◽  
Hermann Ney

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.


2012 ◽  
Vol 44 ◽  
pp. 179-222 ◽  
Author(s):  
P. Nakov ◽  
H. T. Ng

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.


2021 ◽  
Vol 22 (1) ◽  
pp. 78-92
Author(s):  
Katarína Welnitzová ◽  
Daša Munková

Abstract The study identifies, classifies and analyses errors in machine translation (MT) outputs of journalistic texts from English into Slovak, using error analysis. The research results presented in the study are pioneering, since the issue of machine translation – with its strong interdisciplinary character and novelty – has not yet been studied in the Slovak academic environment. The evaluation of the errors is based on a framework for classification of MT errors devised by Vaňko, which was arranged for the Slovak language. The study discusses and explains the issues of sentence structure, including predicativeness, syntactic-semantic correlativeness, and a modal and communication sentence framework. We discovered that the majority of the errors are related to the categories of agreement, word order and nominal morpho-syntax. This fact clearly correlates with features of journalistic texts, in which nominal structures and nouns in all realizations are used to a great extent. Moreover, there are some serious differences between the languages which limit and affect the quality of translation.


Sign in / Sign up

Export Citation Format

Share Document