Corpus Linguistics and Linguistic Theory

Abstract This paper reports on the state-of-the-art in application of multidimensional scaling (MDS) techniques to create semantic maps in linguistic research. MDS refers to a statistical technique that represents objects (lexical items, linguistic contexts, languages, etc.) as points in a space so that close similarity between the objects corresponds to close distances between the corresponding points in the representation. We focus on the use of MDS in combination with parallel corpus data as used in research on cross-linguistic variation. We first introduce the mathematical foundations of MDS and then give an exhaustive overview of past research that employs MDS techniques in combination with parallel corpus data. We propose a set of terminology to succinctly describe the key parameters of a particular MDS application. We then show that this computational methodology is theory-neutral, i.e. it can be employed to answer research questions in a variety of linguistic theoretical frameworks. Finally, we show how this leads to two lines of future developments for MDS research in linguistics.

The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-0048 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Yi Li ◽

Benedikt Szmrecsanyi ◽

Weiwei Zhang

Keyword(s):

Regression Analysis ◽

External Factors ◽

Theory Building ◽

Probabilistic Constraints ◽

Syntactic Variation ◽

Dative Alternation ◽

History Of ◽

Linguistic Constraints ◽

Chinese Writing ◽

Ditransitive Constructions

Abstract Previous research has tracked the history of the theme-recipient alternation (or: “dative” alternation) in Chinese, but few studies have embedded their analysis in a probabilistic variationist framework. Against this backdrop, we explore the language-internal and language-external factors that probabilistically influence the alternation between theme-first and recipient-first ordering in a large diachronic corpus of Chinese writing (1300s–1900s). Our analysis reveals that the recipient-first variant is consistently more frequent than its competitor and even more common in more recent texts than in older texts. Regression analysis also suggests that there are stable linguistic constraints (i.e., animacy and definiteness of theme) and fluid constraints (i.e., end-weight, recipient animacy). Notably, the diachronic instability of end-weight and animacy points to cross-linguistic parallels for ditransitive constructions, including the English dative alternation. We thus contribute to theory building in variationist linguistics by advancing the field’s knowledge about the comparative fluidity versus stability of probabilistic constraints.

Frontmatter

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-frontmatter2 ◽

2021 ◽

Vol 17 (2) ◽

pp. i-iii

Primed progressives? Predicting aspectual choice in World Englishes

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-0012 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Paula Rautionaho ◽

Marianne Hundt

Keyword(s):

Mixed Methods ◽

World Englishes ◽

Syntactic Priming ◽

Tree Analysis

Abstract This corpus-based study focuses on the progressive:nonprogressive alternation from a novel perspective, i.e. the effect of syntactic priming. We annotated a dataset of 5,000 progressive and nonprogressive occurrences in ten different varieties of English from the International Corpus of English for variables such as Aktionsart categories and elements related to priming and subjected the data to a generalized linear mixed methods tree analysis. The results indicate that the progressive is most likely to occur in situations that are durative in nature and when they are preceded by another progressive; overall, we find some evidence of probabilistic indigenization with regard to the use of progressives in different varieties. However, while syntactic priming seems to play a role overall in the choice of the progressive over the nonprogressive, we do not find evidence supporting the idea that priming may explain the use of nonstandard stative progressives.

Transitivity on a continuum: the transitivity index as a predictor of Spanish causatives

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-0019 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Gustavo Guajardo

Keyword(s):

Present Article ◽

General Property ◽

Continuous Measure ◽

Linguistic Features ◽

The Subject

Abstract This paper contributes to the study of transitivity as a general property of the clause. Unlike most previous work on the subject, however, transitivity in the present article is used to study a lexical alternation, namely the two causative predicates dejar ‘let’ and hacer ‘make’ in Spanish. To do this, I use the transitivity index (TI), a weighted continuous measure of transitivity based on Hopper and Thompson’s (1980, transitivity in grammar and discourse, Language 56, 251–299) transitivity parameters. The advantage of the TI is that it assigns different weights to each of the transitivity parameters and it is therefore sensitive to the particular construction it is applied to. I show that the TI can correctly predict the two Spanish causatives dejar ‘let’ and hacer ‘make’ with 80% accuracy and demonstrate that hacer is associated with higher transitivity contexts. In addition, linguistic features of the causer such as grammatical person and number are found to help distinguish between the two predicates. The finding that a lexical alternation can be reduced to a difference in transitivity raises important questions regarding the structure of the lexicon and the type of information it may contain.

Switch-reference and its role in referential choice in Mbyá Guaraní narratives

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0028 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Guillaume Thomas ◽

Gregory Antono ◽

Laurestine Bradford ◽

Angelika Kiss ◽

Darragh Winkelman

Keyword(s):

Main Function ◽

Reference Tracking ◽

Oxford University ◽

Referential Expressions ◽

Mbya Guarani ◽

Referential Choice ◽

Oxford University Press

Abstract Switch-reference has been analyzed as a reference tracking mechanism, whose main function is to avoid ambiguity of reference. One domain where this function has been argued to manifest itself is referential choice. Kibrik (Kibrik, Andrej. 2011. Reference in discourse. Oxford: Oxford University Press) notably proposed that switch-reference marking plays the role of a referential aid, which helps to prevent referential conflict, thereby enabling the production of reduced referential expressions such as pronouns and zeros. The present study probes this theory through an analysis of the role of switch-reference marking in multifactorial models of referential choice in Mbyá Guaraní. We show that while switch-reference increases the likelihood of mention reduction in Mbyá Guaraní, this effect is marginal relative to other predictors of referential choice. We argue that this result is compatible with the analysis of switch-reference as a referential aid, but also supports analyses that emphasize the multiplicity of its functions, beyond the disambiguation of reference.

Frontmatter

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-frontmatter1 ◽

2021 ◽

Vol 17 (1) ◽

pp. i-iv

Exploring semantic differences between the Indonesian prefixes PE- and PEN- using a vector space model

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0023 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Karlina Denistia ◽

Elnaz Shafaei-Bajestan ◽

R. Harald Baayen

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Space Model

Abstract Indonesian has two prefixes, PE- and PEN-, that are similar in form and meaning, but are probably not allomorphs. In this study, we applied a distributional vector space model to clarify whether these prefixes have discriminable semantics. Comparisons of pairs of words within and across morphologically defined sets of words revealed that cosine similarities of pairs consisting of a word with PE- and a word with PEN- were reduced compared to pairs of only PE- words, or of only PEN- words. Furthermore, nouns with PE- were more similar to their base words than was the case for words with PEN-. The specialized use of PE- for words denoting agents, and the specialized use of PEN- for denoting instruments, was also visible in the semantic vector space. These differences in the semantics of PE- and PEN- thus provide further quantitative support for the independent status of PE- as opposed to PEN-.

Dependency network-based approach to the implicit structure and semantic diffusion modes of semantic prosody

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0021 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Jianpeng Liu ◽

Luyao Zhang ◽

Xiaohui Bai

Keyword(s):

Internal Structure ◽

Shortest Path ◽

Path Length ◽

Cognitive Approach ◽

Function Words ◽

The Core ◽

Dependency Networks ◽

Dependency Network ◽

Large Clusters ◽

Semantic Prosody

Abstract This paper studies the implicit structures and the diffusion modes of semantic prosody on the dependency networks of some English words such as cause and their Chinese equivalents. It is found that the structure of semantic prosody is a bi-stratified network consisting of a few large clusters gathering in the center with most nodes of low dependency capability scattered around. With regard to the diffusion modes, results show that: (i) within one shortest path length, the core words directly attract the nodes with the same or similar semantic characteristics and exclude those with conflicting ones, creating the clearest and the most intense semantic diffusion; (ii) over one shortest path length, semantic diffusion is achieved through content words or function words, and the semantic diffusion modes created with function words as bridges are relatively vaguer and more complicated ones. This conclusion also results in the semantic prosodies of other English words and their Chinese equivalent words, revealing, to some extent, a common cognitive approach to understanding the internal structure and the diffusion modes of semantic prosody.

Adjective–noun compounds in Mandarin: a study on productivity

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0059 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Tian Shen ◽

R. Harald Baayen

Keyword(s):

Formation Process ◽

Word Formation ◽

Distributional Semantics ◽

Semantic Transparency ◽

Noun Compounds ◽

Hapax Legomena

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.

Corpus Linguistics and Linguistic Theory
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Walter De Gruyter Gmbh

Generating semantic maps through multidimensional scaling: linguistic applications and theory

The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries

Frontmatter

Primed progressives? Predicting aspectual choice in World Englishes

Transitivity on a continuum: the transitivity index as a predictor of Spanish causatives

Switch-reference and its role in referential choice in Mbyá Guaraní narratives

Frontmatter

Exploring semantic differences between the Indonesian prefixes PE- and PEN- using a vector space model

Dependency network-based approach to the implicit structure and semantic diffusion modes of semantic prosody

Adjective–noun compounds in Mandarin: a study on productivity

Export Citation Format

Corpus Linguistics and Linguistic TheoryLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Walter De Gruyter Gmbh

Generating semantic maps through multidimensional scaling: linguistic applications and theory

The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries

Frontmatter

Primed progressives? Predicting aspectual choice in World Englishes

Transitivity on a continuum: the transitivity index as a predictor of Spanish causatives

Switch-reference and its role in referential choice in Mbyá Guaraní narratives

Frontmatter

Exploring semantic differences between the Indonesian prefixes PE- and PEN- using a vector space model

Dependency network-based approach to the implicit structure and semantic diffusion modes of semantic prosody

Adjective–noun compounds in Mandarin: a study on productivity

Corpus Linguistics and Linguistic Theory
Latest Publications