Crosslinguistic Corpus Studies in Linguistic Typology
Corpus-based studies have become increasingly common in linguistic typology over recent years, amounting to the emergence of a new field that we call corpus-based typology. The core idea of corpus-based typology is to take languages as populations of utterances and to systematically investigate text production across languages in this sense. From a usage-based perspective, investigations of variation and preferences of use are at the core of understanding the distribution of conventionalized structures and their diachronic development across languages. Specific findings of corpus-based typological studies pertain to universals of text production, for example, in prosodic partitioning; to cognitive biases constraining diverse patterns of use, for example, in constituent order; and to correlations of diverse patterns of use with language-specific structures and conventions. We also consider remaining challenges for corpus-based typology, in particular the development of crosslinguistically more representative corpora that include spoken (or signed) texts, and its vast potential in the future.