scholarly journals Computational learning of construction grammars

2016 ◽  
Vol 9 (2) ◽  
pp. 254-292 ◽  
Author(s):  
JONATHAN DUNN

abstractThis paper presents an algorithm for learning the construction grammar of a language from a large corpus. This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset. The basic task of construction grammar induction is to identify the minimum set of constructions that represents the language in question with maximum descriptive adequacy. These constructions must (1) generalize across an unspecified number of units while (2) containing mixed levels of representation internally (e.g., both item-specific and schematized representations), and (3) allowing for unfilled and partially filled slots. Additionally, these constructions may (4) contain recursive structure within a given slot that needs to be reduced in order to produce a sufficiently schematic representation. In other words, these constructions are multi-length, multi-level, possibly discontinuous co-occurrences which generalize across internal recursive structures. These co-occurrences are modeled using frequency and the ΔP measure of association, expanded in novel ways to cover multi-unit sequences. This work provides important new evidence for the learnability of construction grammars as well as a tool for the automated corpus analysis of constructions.

2019 ◽  
Vol 40 (1) ◽  
pp. 24-52 ◽  
Author(s):  
Stephanie Horch

Abstract Usage-based research in linguistics has to a large extent relied on corpus data. However, a feature’s “failure to appear in even a very large corpus (such as the Web) is not evidence for ungrammaticality, nor is appearance evidence for grammaticality” (Schütze and Sprouse 2013: 29). It is therefore advisable to complement corpus-based analyses with experimental data, so as to (ideally) obtain converging evidence. This paper reviews reasons for combining corpus linguistic with psycholinguistic experimental methods, and demonstrates how research on varieties of English can profit from experimentation. For a study of conversion in Asian Englishes, the maze task (Forster, Guerrera, and Elliot 2009; Forster 2010) was implemented with a web-based, open-source software. The results of the experiment dovetail with a previous analysis of the Corpus of Global Web-based English (Davies 2013). These results should encourage researchers not to base findings exclusively on corpus evidence, but corroborate them by means of experimental data.


2005 ◽  
Vol 10 (4) ◽  
pp. 469-488 ◽  
Author(s):  
Sang-suk Oh

The purpose of this paper is to discuss actual usage of the two Korean causal conjunctive suffixes, -(e)seand-(u)nikka, and to propose their multi-layered semantics based on analysis of corpus data. To account for the functional differences of the two conjunctives, most previous studies focused on different syntactic distributions or semantic contrast by employing an objectivist viewpoint, failing to incorporate the polyfunctionality, semantic overlapping and pragmatic ambiguities of them. This paper advances that the meanings of the two causal suffixes are distributed on four different cognitive-discourse levels: content, epistemic, speech act, and discourse level. Corpus analysis does reveal that all four levels are accessible to both conjunctive suffixes but the difference between the two suffixes lies in the different degree of accessibility of these four levels in their sentence semantics. This finding suggests that we treat these linguistic categories more flexibly by accepting their gradient and pragmatically ambiguous status.


2018 ◽  
Vol 29 (2) ◽  
pp. 275-311 ◽  
Author(s):  
Jonathan Dunn

AbstractThis paper develops a construction-based dialectometry capable of identifying previously unknown constructions and measuring the degree to which a given construction is subject to regional variation. The central idea is to learn a grammar of constructions (a CxG) using construction grammar induction and then to use these constructions as features for dialectometry. This offers a method for measuring the aggregate similarity between regional CxGs without limiting in advance the set of constructions subject to variation. The learned CxG is evaluated on how well it describes held-out test corpora while dialectometry is evaluated on how well it can model regional varieties of English. The method is tested using two distinct datasets: First, the International Corpus of English representing eight outer circle varieties; Second, a web-crawled corpus representing five inner circle varieties. Results show that the method (1) produces a grammar with stable quality across sub-sets of a single corpus that is (2) capable of distinguishing between regional varieties of English with a high degree of accuracy, thus (3) supporting dialectometric methods for measuring the similarity between varieties of English and (4) measuring the degree to which each construction is subject to regional variation. This is important for cognitive sociolinguistics because it operationalizes the idea that competition between constructions is organized at the functional level so that dialectometry needs to represent as much of the available functional space as possible.


2020 ◽  
Vol 6 (26) ◽  
pp. eaaz1002 ◽  
Author(s):  
Stephen Ferrigno ◽  
Samuel J. Cheyette ◽  
Steven T. Piantadosi ◽  
Jessica F. Cantlon

The question of what computational capacities, if any, differ between humans and nonhuman animals has been at the core of foundational debates in cognitive psychology, anthropology, linguistics, and animal behavior. The capacity to form nested hierarchical representations is hypothesized to be essential to uniquely human thought, but its origins in evolution, development, and culture are controversial. We used a nonlinguistic sequence generation task to test whether subjects generalize sequential groupings of items to a center-embedded, recursive structure. Children (3 to 5 years old), U.S. adults, and adults from a Bolivian indigenous group spontaneously induced recursive structures from ambiguous training data. In contrast, monkeys did so only with additional exposure. We quantify these patterns using a Bayesian mixture model over logically possible strategies. Our results show that recursive hierarchical strategies are robust in human thought, both early in development and across cultures, but the capacity itself is not unique to humans.


2003 ◽  
Vol 8 (2) ◽  
pp. 245-282 ◽  
Author(s):  
Stefanie Wulff

This paper is concerned with the question of which factors govern prenominal adjective order (AO) in English. In particular, the analysis aims to overcome shortfalls of previous analyses by, firstly, adopting a multifactorial approach integrating all variables postulated in the literature, thereby doing justice to the well-established fact that cognitive and psychological processes are multivariate and complex. Secondly, the phenomenon is investigated on the basis of a large corpus, rendering the results obtained more representative and valid of naturally occurring language than those of previous studies. To this end, corpus-linguistic operationalizations of phonological, syntactic, semantic and pragmatic determinants of AO are devised and entered into a Linear Discriminant Analysis, which determines the relative influence of all variables (semantic variables being most important) and yields a classification accuracy of 78%. Moreover, by means of the operationalizations developed in this analysis, the ordering of yet unanalyzed adjective strings can be predicted with about equal accuracy (73.5%).


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Dirk Pijpops ◽  
Dirk Speelman ◽  
Freek Van de Velde ◽  
Stefan Grondelaers

Abstract Construction grammar organizes its basic elements of description, its constructions, into networks that range from concrete, lexically-filled constructions to fully schematic ones, with several levels of partially schematic constructions in between. However, only few corpus studies with a constructionist background take this multi-level nature fully into account. In this paper, we argue that understanding language variation can be advanced considerably by systematically formulating and testing hypotheses at various levels in the constructional network. To illustrate the approach, we present a corpus study of the Dutch naar-alternation. It is found that this alternation primarily functions at an intermediate level in the constructional network.


Author(s):  
David J. Lobina

The role of recursion in language is universal and unique. It is universal because the (Specifier)-Head-Complement(s) geometry is the type of structuring that all phrases and all languages unequivocally adhere to, and complexes of such phrases constitute a general recursive structure. It is unique because the asymmetric nature of [(Specifier)-[Head-Complement(s)]] structures is unattested in other domains of human cognition or in the cognition of other animal species. The common claim that not all languages manifest recursive structures is usually couched in terms of self-embedded sentences, a particular sub-type of the (Specifier)-Head-Complement(s) geometry. The increasingly common claim that certain representations in human general cognition or in the animal kingdom are isomorphic to language’s recursive structures is the result of great simplification of the representations under comparison, which undercuts the force of the argument. Linguistic structures in the form of bundles of (Specifier)-Head-Complement(s) remain quirky through and through—and universal in language.


2019 ◽  
Vol 23 (3) ◽  
pp. 435-466
Author(s):  
Françoise Rose

Abstract Well attested diachronic sources for applicative markers are adpositions and verbs. Nominal sources are regarded as dubious, although nouns have been argued to have developed into applicatives in some languages (such as Murrinhpatha, in Nordlinger (this issue). In this paper, I argue for a previously unreported source for applicatives, by presenting the possible applicative function of the classifiers of Mojeño Trinitario (Arawak, Bolivia), based on a large corpus of texts collected in the field. While classifiers within verbs derive prototypical applicative constructions, they show unusual properties as applicatives, namely in their semantics. The applicative markers are selected according to the physical properties of the referent of the applied object, rather than its semantic role within the sentence. And although most of the classifiers show no similarity to free nominal lexemes in the present state of the language, the classifiers found in Mojeño Trinitario verbs are very likely derived historically from nominal incorporation, a typical path of development. Mojeño Trinitario data offer new evidence for the possibility of elements derived from nouns to be reanalyzed as morphological applicative markers.


2020 ◽  
Vol 48 (2) ◽  
pp. 166-198
Author(s):  
Maciej Grabski

The present article looks at different patterns of adjectival postmodification in Old English. A detailed corpus analysis is performed, whose results are interpreted within the framework of Construction Grammar. This study contributes to previous research on the subject by using a large set of corpus data which pave the way for adopting a usage-based approach. The results indicate that the patterns analyzed fulfilled different functions, which in the framework adopted is grounds enough for assigning them to different conceptual categories, i.e., “constructions.” Further, I investigate the mutual relations between these constructions as well as the internal dynamics of their functions and development. The findings support the basic constructionist notion that language is most effectively described as a complex and dynamic network of interrelated constructions.


Sign in / Sign up

Export Citation Format

Share Document