scholarly journals Modeling word and morpheme order in natural language as an efficient tradeoff of memory and surprisal

2020 ◽  
Author(s):  
Michael Hahn ◽  
Judith Degen ◽  
Richard Futrell

Memory limitations are known to constrain language comprehension and production, and have been argued to account for crosslinguistic word order regularities. However, a systematic assessment of the role of memory limitations in language structure has proven elusive, in part because it is hard to extract precise large-scale quantitative generalizations about language from existing mechanistic models of memory use in sentence processing. We provide an architecture-independent information-theoretic formalization of memory limitations which enables a simple calculation of the memory efficiency of languages. Our notion of memory efficiency is based on the idea of a memory–surprisal tradeoff : a certain level of average surprisal per word can only be achieved at the cost of storing some amount of information about past context. Based on this notion of memory usage, we advance the Efficient Tradeoff Hypothesis: the order of elements in natural language is under pressure to enable favorable memory-surprisal tradeoffs. We derive that languages enable more efficient tradeoffs when they exhibit information locality: when predictive information about an element is concentrated in its recent past. We provide empirical evidence from three test domains in support of the Efficient Tradeoff Hypothesis: a reanalysis of a miniature artificial language learning experiment, a large-scale study of word order in corpora of 54 languages, and an analysis of morpheme order in two agglutinative languages. These results suggest that principles of order in natural language can be explained via highly generic cognitively motivated principles and lend support to efficiency-based models of the structure of human language.

2020 ◽  
Author(s):  
Zachariah Reuben Cross ◽  
Lena Zou-Williams ◽  
Erica Wilkinson ◽  
Matthias Schlesewsky ◽  
Ina Bornkessel-Schlesewsky

Artificial grammar learning (AGL) paradigms are used extensively to characterise (neuro-)cognitive bases of language learning. However, despite their effectiveness in characterising the capacity to learn complex structured sequences, AGL paradigms lack ecological validity and typically do not account for cross-linguistic differences in sentence comprehension. Here, we describe a new modified miniature language paradigm – Mini Pinyin – that mimics natural language as it is based on an existing language (Mandarin Chinese) and includes both structure and meaning. Mini Pinyin contains a number of cross-linguistic elements, including varying word orders and classifier-noun rules. To evaluate the effectiveness of Mini Pinyin, 76 (mean age = 24.9; 26 female) monolingual native English speakers completed a learning phase followed by a sentence acceptability judgement task. Generalised mixed effects modelling revealed that participants attained a moderate degree of accuracy on the judgement task, with performance scores ranging from 25% - 100% accuracy depending on the word order of the sentence. Further, sentences compatible with the canonical English word order were learned more efficiently than non-canonical word orders. We controlled for inter-individual differences in statistical learning ability, which accounted for ~20% of the variance in performance on the sentence judgement task. We provide stimuli and statistical analysis scripts as open source resources and discuss how future research can utilise this paradigm to study the neurobiological basis of language learning. Mini Pinyin affords a convenient tool for improving the future of language learning research by building on the parameters of traditional AGL or existing miniature language paradigms.


Author(s):  
Pauline Jacobson

This chapter examines the currently fashionable notion of ‘experimental semantics’, and argues that most work in natural language semantics has always been experimental. The oft-cited dichotomy between ‘theoretical’ (or ‘armchair’) and ‘experimental’ is bogus and should be dropped form the discourse. The same holds for dichotomies like ‘intuition-based’ (or ‘thought experiments’) vs. ‘empirical’ work (and ‘real experiments’). The so-called new ‘empirical’ methods are often nothing more than collecting the large-scale ‘intuitions’ or, doing multiple thought experiments. Of course the use of multiple subjects could well allow for a better experiment than the more traditional single or few subject methodologies. But whether or not this is the case depends entirely on the question at hand. In fact, the chapter considers several multiple-subject studies and shows that the particular methodology in those cases does not necessarily provide important insights, and the chapter argues that some its claimed benefits are incorrect.


Probus ◽  
2020 ◽  
Vol 32 (1) ◽  
pp. 93-127
Author(s):  
Bradley Hoot ◽  
Tania Leal

AbstractLinguists have keenly studied the realization of focus – the part of the sentence introducing new information – because it involves the interaction of different linguistic modules. Syntacticians have argued that Spanish uses word order for information-structural purposes, marking focused constituents via rightmost movement. However, recent studies have challenged this claim. To contribute sentence-processing evidence, we conducted a self-paced reading task and a judgment task with Mexican and Catalonian Spanish speakers. We found that movement to final position can signal focus in Spanish, in contrast to the aforementioned work. We contextualize our results within the literature, identifying three basic facts that theories of Spanish focus and theories of language processing should explain, and advance a fourth: that mismatches in information-structural expectations can induce processing delays. Finally, we propose that some differences in the existing experimental results may stem from methodological differences.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


2021 ◽  
pp. 026765832199790
Author(s):  
Anna Chrabaszcz ◽  
Elena Onischik ◽  
Olga Dragoy

This study examines the role of cross-linguistic transfer versus general processing strategy in two groups of heritage speakers ( n = 28 per group) with the same heritage language – Russian – and typologically different dominant languages: English and Estonian. A group of homeland Russian speakers ( n = 36) is tested to provide baseline comparison. Within the framework of the Competition model (MacWhinney, 2012), cross-linguistic transfer is defined as reliance on the processing cue prevalent in the heritage speaker’s dominant language (e.g. word order in English) for comprehension of heritage language. In accordance with the Isomorphic Mapping Hypothesis (O’Grady and Lee, 2005), the general processing strategy is defined in terms of isomorphism as a linear alignment between the order of the sentence constituents and the temporal sequence of events. Participants were asked to match pictures on the computer screen with auditorily presented sentences. Sentences included locative or instrumental constructions, in which two cues – word order (basic vs. inverted) and isomorphism mapping (isomorphic vs. nonisomorphic) – were fully crossed. The results revealed that (1) Russian native speakers are sensitive to isomorphism in sentence processing; (2) English-dominant heritage speakers experience dominant language transfer, as evidenced by their reliance primarily on the word order cue; (3) Estonian-dominant heritage speakers do not show significant effects of isomorphism or word order but experience significant processing costs in all conditions.


Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


Sign in / Sign up

Export Citation Format

Share Document