Formal Basis of a Language Universal

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.

Download Full-text

Word-Order Issues in English-to-Urdu Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0007-0 ◽

2011 ◽

Vol 95 (1) ◽

pp. 87-106 ◽

Cited By ~ 3

Author(s):

Bushra Jawaid ◽

Daniel Zeman

Keyword(s):

Machine Translation ◽

Word Order ◽

Statistical Machine Translation ◽

Parse Tree ◽

Hard Problem ◽

Long Distance ◽

Translation Process ◽

English Sentence ◽

European Languages

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

Evaluating Machine Translations from Arabic into English and Vice Versa

International Research Journal of Electronics and Computer Engineering ◽

10.24178/irjece.2017.3.2.01 ◽

2017 ◽

Vol 3 (2) ◽

pp. 1

Author(s):

Riyad Al-Shalabi ◽

Ghassan Kanaan ◽

Huda Al-Sarhan ◽

Alaa Drabsh ◽

Islam Al-Husban

Keyword(s):

Machine Translation ◽

Word Order ◽

Solution Structure ◽

Sentence Length ◽

Third Party ◽

Abstract Machine ◽

Translation Tools ◽

Index Terms ◽

Word Selection ◽

Accuracy Degree

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one. Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.

Download Full-text

Insertion-based Decoding with Automatically Inferred Generation Order

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00292 ◽

2019 ◽

Vol 7 ◽

pp. 661-676 ◽

Cited By ~ 3

Author(s):

Jiatao Gu ◽

Qi Liu ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Real World ◽

Word Order ◽

Code Generation ◽

State Of The Art ◽

Generation Model ◽

Beam Search ◽

Input Information ◽

Sequence Generation ◽

Image Caption

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Download Full-text

Decoding Middle Welsh clauses or “Avoid Ambiguity”

Indogermanische Forschungen ◽

10.1515/if-2014-0008 ◽

2014 ◽

Vol 119 (1) ◽

pp. 125-148

Author(s):

Axel Harlos ◽

Erich Poppe ◽

Paul Widmer

Keyword(s):

Present Article ◽

Word Order ◽

Morphological Features ◽

Syntactic Ambiguity ◽

Verb Agreement ◽

Lower Tier ◽

Grammatical Relations ◽

Narrative Context ◽

Free Word

Abstract Middle Welsh is a language with a restricted set of morphosyntactic distinctions for grammatical relations and with relatively free word order in positive main declarative causes. However, syntactic ambiguity rarely, if ever, arises in natural texts. The present article shows in a corpus-based study how syntactic ambiguity is prevented and how morphological features interact with two referential properties, namely animacy and accessibility, in order to successfully identify grammatical relations in Middle Welsh. Further lower-tier factors are the semantics of the verb and the wider narrative context. The article complements recent insights suggesting that subject-verb agreement is not only determined by wordorder patterns, but also by referential properties of subjects.

Download Full-text

Blitz Latin Revisited

Journal of Classics Teaching ◽

10.1017/s2058631015000203 ◽

2015 ◽

Vol 16 (32) ◽

pp. 43-49

Author(s):

John F. White

Keyword(s):

Machine Translation ◽

Word Order ◽

Latin Language ◽

Key Issues

SummaryDevelopment of the machine translator Blitz Latin between the years 2002 and 2015 is discussed. Key issues remain the ambiguity in meaning of Latin stems and inflections, and the word order of the Latin language. Attempts to improve machine translation of Latin are described by the programmer.

Download Full-text

Late Placement of the Finite Verb in Old Norse Fornyrðislag Meter

Journal of Germanic Linguistics ◽

10.1017/s1470542712000037 ◽

2012 ◽

Vol 24 (3) ◽

pp. 233-269 ◽

Cited By ~ 2

Author(s):

Haukur Þorgeirsson

Keyword(s):

Present Article ◽

Word Order ◽

Old Norse ◽

West Germanic ◽

Subordinate Clauses ◽

Syntactic Difference ◽

Finite Verb

In Old Norse poetry, there is a syntactic difference between bound clauses (subordinate clauses and main clauses introduced by a con-junction) and unbound clauses (main clauses not introduced by a conjunction). In bound clauses, the finite verb is often placed late in the sentence, violating the V2 requirement upheld in prose. In unbound clauses, the V2 requirement is normally adhered to, but in fornyrðislag poetry, late placement of the finite verb is occasionally found. Hans Kuhn explained these instances as a result of influence from West Germanic poetry. The present article argues that these instances can be explained as a remnant of the Proto-Norse word order, and that this explanation is better supported by the data.*

Download Full-text

A Syntactic Reordering Model of Chinese Special Sentences for Patent Machine Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1923 ◽

2013 ◽

Vol 411-414 ◽

pp. 1923-1929

Author(s):

Ren Fen Hu ◽

Yun Zhu ◽

Yao Hong Jin ◽

Jia Yong Chen

Keyword(s):

Machine Translation ◽

Statistical Methods ◽

Word Order ◽

English Word ◽

Long Distance ◽

Rule Based

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.

Download Full-text

Rabbi Samson Raphael Hirsch as a Peshat Commentator: Literary Aspects of His Commentary on the Pentateuch

Review of Rabbinic Judaism ◽

10.1163/15700704-12341237 ◽

2012 ◽

Vol 15 (2) ◽

pp. 190-200

Author(s):

Jonathan Jacobs

Keyword(s):

Present Article ◽

Word Order ◽

Good Reason ◽

Modern World ◽

Multiple Points ◽

Jewish Communities ◽

Vast Literature ◽

Points Of View ◽

German Jewish

Abstract The heart of Samson Raphael Hirsch’s literary corpus is his great commentary on the Pentateuch. In the commentary’s heyday, the German Jewish communities treated it with the reverence traditionally accorded Rashi on the Torah, and it was always to be found on the desks of both scholars and laypersons. Although the vast literature on Hirsch focuses on his life and his doctrine of Torah im Derech Eretz, much has been written about various aspects of the commentary on the Pentateuch, including Hirsch’s approach to the reasons for the precepts, his etymological method, his attitude toward the modern world, his treatment of the patriarchs’ transgressions, and his method as a translator. In addition to these interests, Hirsch’s commentary on the Pentateuch is marked by a fine and well-developed literary sensitivity that comes to the fore in many places. Not only has this not been studied in detail; it is never even mentioned in the various introductions to and studies of Hirsch. It must be acknowledged that the literary elements of Hirsch’s commentary are heavily outnumbered by what can be defined as derash. Still, the extensive attention to other facets of his personality and exegesis has led to the total neglect of the literary aspects of Hirsch’s commentaries and has overshadowed his aesthetic and literary sensitivity. Thus there is good reason for examining this aspect of his work—and that is the goal of the present article. I focus on four literary phenomena that Hirsch addresses systematically: multiple points of view; the designations applied to biblical characters; the phenomenon of consecutive statements; and word order.

Download Full-text

Factors Behind the Effectiveness of an Unsupervised Neural Machine Translation System between Korean and Japanese

Applied Sciences ◽

10.3390/app11167662 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7662

Author(s):

Yong-Seok Choi ◽

Yo-Han Park ◽

Seung Yun ◽

Sang-Hun Kim ◽

Kong-Joo Lee

Keyword(s):

Machine Translation ◽

Word Order ◽

Target Language ◽

Translation System ◽

Generation Model ◽

Neural Machine Translation ◽

Language Differences ◽

Machine Translation System ◽

The Common ◽

Morphological System

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.

Download Full-text

The Development of Latin Clause Structure

10.1093/oso/9780198759522.001.0001 ◽

2017 ◽

Cited By ~ 7

Author(s):

Lieven Danckaert

Keyword(s):

Word Order ◽

Phrase Structure ◽

Phrase Structure Grammar ◽

Ample Evidence ◽

Latin Word ◽

Modal Verb ◽

Language Universal ◽

Diachronic Development ◽

High Degree ◽

Order Four

The focus of this book is Latin word order, and in particular the relative ordering of direct objects and lexical verbs (OV vs. VO), and auxiliaries and non-finite verbs (VAux vs. AuxV). One aim of the book is to offer a first detailed, corpus-based description of these two word order alternations, with special emphasis on their diachronic development in the period from ca. 200 BC until 600 AD. The corpus data reveal that some received wisdom needs to be reconsidered. For one thing, there is no evidence for any major increase in productivity of the order VO during the eight centuries under investigation. In addition, the order AuxV only becomes more frequent in clauses with a modal verb and an infinitive, not in clauses with a BE-auxiliary and a past participle. A second goal is to answer a more fundamental question about Latin syntax, namely whether or not the language is ‘configurational’, in the sense that a phrase structure grammar (with ‘higher-order constituents’ such as verb phrases) is needed to describe and analyse facts of Latin word order. Four pieces of evidence are presented which suggest that Latin is indeed a fully configurational language, despite its high degree of word order flexibility. Specifically, it is shown that there is ample evidence for the existence of a verb phrase constituent. The book thus contributes to the ongoing debate whether configurationality (phrase structure) is a language universal or not.

Download Full-text