scholarly journals Formal Basis of a Language Universal

2021 ◽  
Vol 47 (1) ◽  
pp. 9-42
Author(s):  
Miloš Stanojević ◽  
Mark Steedman

Abstract Steedman (2020) proposes as a formal universal of natural language grammar that grammatical permutations of the kind that have given rise to transformational rules are limited to a class known to mathematicians and computer scientists as the “separable” permutations. This class of permutations is exactly the class that can be expressed in combinatory categorial grammars (CCGs). The excluded non-separable permutations do in fact seem to be absent in a number of studies of crosslinguistic variation in word order in nominal and verbal constructions. The number of permutations that are separable grows in the number n of lexical elements in the construction as the Large Schröder Number Sn−1. Because that number grows much more slowly than the n! number of all permutations, this generalization is also of considerable practical interest for computational applications such as parsing and machine translation. The present article examines the mathematical and computational origins of this restriction, and the reason it is exactly captured in CCG without the imposition of any further constraints.

2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


Author(s):  
Riyad Al-Shalabi ◽  
Ghassan Kanaan ◽  
Huda Al-Sarhan ◽  
Alaa Drabsh ◽  
Islam Al-Husban

Abstract—Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one.  Index Terms: Machine Translation, MTs, Evaluation for Machine Translation, Google, Bing, Systranet and Babylon, Machine Translation tools, BLEU, iBLEU.


2019 ◽  
Vol 7 ◽  
pp. 661-676 ◽  
Author(s):  
Jiatao Gu ◽  
Qi Liu ◽  
Kyunghyun Cho

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.


2014 ◽  
Vol 119 (1) ◽  
pp. 125-148
Author(s):  
Axel Harlos ◽  
Erich Poppe ◽  
Paul Widmer

Abstract Middle Welsh is a language with a restricted set of morphosyntactic distinctions for grammatical relations and with relatively free word order in positive main declarative causes. However, syntactic ambiguity rarely, if ever, arises in natural texts. The present article shows in a corpus-based study how syntactic ambiguity is prevented and how morphological features interact with two referential properties, namely animacy and accessibility, in order to successfully identify grammatical relations in Middle Welsh. Further lower-tier factors are the semantics of the verb and the wider narrative context. The article complements recent insights suggesting that subject-verb agreement is not only determined by wordorder patterns, but also by referential properties of subjects.


2015 ◽  
Vol 16 (32) ◽  
pp. 43-49
Author(s):  
John F. White

SummaryDevelopment of the machine translator Blitz Latin between the years 2002 and 2015 is discussed. Key issues remain the ambiguity in meaning of Latin stems and inflections, and the word order of the Latin language. Attempts to improve machine translation of Latin are described by the programmer.


2012 ◽  
Vol 24 (3) ◽  
pp. 233-269 ◽  
Author(s):  
Haukur Þorgeirsson

In Old Norse poetry, there is a syntactic difference between bound clauses (subordinate clauses and main clauses introduced by a con-junction) and unbound clauses (main clauses not introduced by a conjunction). In bound clauses, the finite verb is often placed late in the sentence, violating the V2 requirement upheld in prose. In unbound clauses, the V2 requirement is normally adhered to, but in fornyrðislag poetry, late placement of the finite verb is occasionally found. Hans Kuhn explained these instances as a result of influence from West Germanic poetry. The present article argues that these instances can be explained as a remnant of the Proto-Norse word order, and that this explanation is better supported by the data.*


2013 ◽  
Vol 411-414 ◽  
pp. 1923-1929
Author(s):  
Ren Fen Hu ◽  
Yun Zhu ◽  
Yao Hong Jin ◽  
Jia Yong Chen

This paper presents a rule-based model to deal with the long distance reordering of Chinese special sentences. In this model, we firstly identify special prepositions and their syntax levels. After that, sentences are parsed and transformed to be much closer to English word order with reordering rules. We evaluate our method within a patent MT system, which shows a great advantage over reordering with statistical methods. With the presented reordering model, the performance of patent machine translation of Chinese special sentences is effectually improved.


2012 ◽  
Vol 15 (2) ◽  
pp. 190-200
Author(s):  
Jonathan Jacobs

Abstract The heart of Samson Raphael Hirsch’s literary corpus is his great commentary on the Pentateuch. In the commentary’s heyday, the German Jewish communities treated it with the reverence traditionally accorded Rashi on the Torah, and it was always to be found on the desks of both scholars and laypersons. Although the vast literature on Hirsch focuses on his life and his doctrine of Torah im Derech Eretz, much has been written about various aspects of the commentary on the Pentateuch, including Hirsch’s approach to the reasons for the precepts, his etymological method, his attitude toward the modern world, his treatment of the patriarchs’ transgressions, and his method as a translator. In addition to these interests, Hirsch’s commentary on the Pentateuch is marked by a fine and well-developed literary sensitivity that comes to the fore in many places. Not only has this not been studied in detail; it is never even mentioned in the various introductions to and studies of Hirsch. It must be acknowledged that the literary elements of Hirsch’s commentary are heavily outnumbered by what can be defined as derash. Still, the extensive attention to other facets of his personality and exegesis has led to the total neglect of the literary aspects of Hirsch’s commentaries and has overshadowed his aesthetic and literary sensitivity. Thus there is good reason for examining this aspect of his work—and that is the goal of the present article. I focus on four literary phenomena that Hirsch addresses systematically: multiple points of view; the designations applied to biblical characters; the phenomenon of consecutive statements; and word order.


2021 ◽  
Vol 11 (16) ◽  
pp. 7662
Author(s):  
Yong-Seok Choi ◽  
Yo-Han Park ◽  
Seung Yun ◽  
Sang-Hun Kim ◽  
Kong-Joo Lee

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.


Author(s):  
Lieven Danckaert

The focus of this book is Latin word order, and in particular the relative ordering of direct objects and lexical verbs (OV vs. VO), and auxiliaries and non-finite verbs (VAux vs. AuxV). One aim of the book is to offer a first detailed, corpus-based description of these two word order alternations, with special emphasis on their diachronic development in the period from ca. 200 BC until 600 AD. The corpus data reveal that some received wisdom needs to be reconsidered. For one thing, there is no evidence for any major increase in productivity of the order VO during the eight centuries under investigation. In addition, the order AuxV only becomes more frequent in clauses with a modal verb and an infinitive, not in clauses with a BE-auxiliary and a past participle. A second goal is to answer a more fundamental question about Latin syntax, namely whether or not the language is ‘configurational’, in the sense that a phrase structure grammar (with ‘higher-order constituents’ such as verb phrases) is needed to describe and analyse facts of Latin word order. Four pieces of evidence are presented which suggest that Latin is indeed a fully configurational language, despite its high degree of word order flexibility. Specifically, it is shown that there is ample evidence for the existence of a verb phrase constituent. The book thus contributes to the ongoing debate whether configurationality (phrase structure) is a language universal or not.


Sign in / Sign up

Export Citation Format

Share Document