Diachronic syntax based on constituency and dependency annotated corpora

2018 ◽  
Vol 18 (1) ◽  
pp. 74-99
Author(s):  
Achim Stein

Abstract This contribution presents two syntactically annotated corpora of Old French, Modéliser le changement: les voies du français (MCVF) and the Syntactic Reference Corpus of Medieval French (SRCMF). The focus is on how the underlying syntactic theory (constituency vs. dependency) influences the grammar model and how this choice is reflected in the syntactic annotations of the corpora. The comparison relates to the most relevant general properties of the corpora as well as to two phenomena, null subjects and cleft constructions. Null subjects highlight possible conflicts between syntactic annotation models and syntactic theory, and the information-structural properties of cleft constructions pose a particular problem for the interpretation and annotation of historical corpora. Both phenomena are major instances of diachronic variation in French. The study is relevant for corpus users working on diachronic syntax, as well for corpus builders wishing to design a grammar model for annotation.

Author(s):  
Richard Ingham

AbstractOld French subject pronouns (Spro) were omissible if postverbal (Foulet 1928), but not freely so (Vance 1997, Zimmermann 2014). This article addresses their partial omissibility in discourse-syntax terms, following work on partial null subject languages by Holmberg and Nikanne (2002) and Modesto (2008). An observational study of dialogic responses in 13th century prose romances is first reported, finding strong indications of covariation between the Topic/Focus status of an initial non-subject constituent and the expression/omission of post-verbal Spro. A quantitative investigation, in such texts, of preposed discourse-linked anaphoric constituents and preposed intensifiers, taken as diagnostic of Topichood and Focushood respectively, confirmed this analysis. We take null Spro to be available (i) when a null Topic operator targets left-peripheral TopicP, and (ii) with a left-peripheral Focused expression. When a discourse-linked non-subject constituent occupies TopicP, however, Spro must be overt.


Author(s):  
Larraitz Uria ◽  
Ainara Estarrona ◽  
Izaskun Aldezabal ◽  
Maria Jesús Aranzabe ◽  
Arantza Díaz de Ilarraza ◽  
...  

Author(s):  
Alexandru Nicolae

Chapter 6 highlights the novel theoretical and empirical facts brought about by the word order changes that occurring in the passage from old to modern Romanian, showing how the diachrony of Romanian may contribute to a better understanding of the history of the Romance languages and of the Balkan Sprachbund, as well as to syntactic theory and syntactic change in general. One important dimension of diachronic variation and change is the height of nouns and verbs along their extended projections (lower vs higher V- and N-movement). The two perspectives from which language contact proves relevant in the diachronic development of word order in Romanian, language contact by means of translation and areal language contact, are discussed. The chapter also addresses the issue of surface analogy vs deep structural properties; once again, Romanian emerges as a Romance language in a Balkan suit, as Romance deep structural properties are instantiated by means of Balkan word order patterns.


Author(s):  
Ulrike Mosel

This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.


This book considers the null-subject phenomenon, whereby some languages lack an overtly realized referential subject in specific contexts. In generative syntax—the approach adopted in this volume—the phenomenon has traditionally been explained in terms of a ‘pro-drop’ parameter with associated cluster properties; more recently, however, it has become clear that pro-drop phenomena do not always correlate with all the initially predicted cluster properties. This volume returns to the centre of the debate surrounding the empirical phenomena associated with null subjects. Experts in the field explore the cluster properties associated with pro-drop; the types of null category involved in null-subject phenomena and their identification; and the typology of null-subject languages, with a special focus on partial null-subject languages. Chapters include both novel empirical data and new theoretical analyses covering the major approaches to null subjects in generative grammar. A wide range of languages are examined, ranging from the most commonly studied in research into null subjects, such as Finnish and Italian, to lesser-studied languages such as Vietnamese and Polish, minority languages such as Cimbrian and Kashubian, and historical varieties such as Old French and Old High German. The research presented also contributes to the understanding of other key syntactic phenomena, such as the nature of control, the role of information structure and semantics in syntax, the mechanisms of language change, and the formalization of language variation.


2018 ◽  
Vol 23 (4) ◽  
pp. 467-493 ◽  
Author(s):  
Jūratė Ruzaitė

Abstract The present study accounts for the use of general extenders (GEs) in spoken and written registers. The repertoire and usage of GEs is analysed in Lithuanian by focusing on their distribution across different registers, their structural properties, and discourse-pragmatic functions. The study is based on a reference corpus of Lithuanian, which includes four subcorpora of written discourse and a subcorpus of spoken discourse. The findings indicate that there are some significant cross-generic differences in GE frequency, but most frequently GEs in Lithuanian are used in written academic discourse. With regard to the structural types of GEs, adjunctives are considerably more frequent than disjunctives. GE structure allows for a large degree of variation, and in spoken interaction GEs can include deictic elements. Concerning discourse-pragmatic functions, GEs are predominantly used to serve textual and interpersonal functions, which appear to be strongly related to the structural type of the GE and discourse settings.


Author(s):  
Laurie Zaring

AbstractOld French (OF) is often characterized as a Germanic-style asymmetric V2 language, although this characterization is often questioned. The present study evaluates the nature of OF V2 from a quantitative perspective. An extensive set of data provided by syntactically annotated corpora shows that both IP and CP structure change over the OF period. Focusing on Germanic inversion – XVS word order – I argue that most of the attested inversion in OF occurs within an elaborated IP structure and that this type of subject inversion dwindles over time due to the decreasing use of null expletives. True Germanic-style embedded V2 does not appear until the late 12thcentury, and is only rarely used throughout the 13thcentury. Thus, OF is an asymmetric V2 language, but with a difference, namely in having an IP field that allows for apparent V2 orders and a CP field that is only marginally employed.


2017 ◽  
Vol 22 (1) ◽  
pp. 107-140 ◽  
Author(s):  
Mariya Koleva ◽  
Melissa Farasyn ◽  
Bart Desmet ◽  
Anne Breitbarth ◽  
Véronique Hoste

Abstract Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.


Author(s):  
Christine Meklenborg

Old French and old Occitan both have access to the resumptive particle SI that occurs in second position of the clause, between a fronted constituent and the finite verb. In this paper, the information-structural properties of the particle are compared and it is shown that it may be used to introduce new information in old Occitan, an option that is not available in old French. Further, an analysis for SI is proposed. It is suggested that SI may be both a head and a phrase, something which has consequences for the analysis of the fronted element of the clause.


Sign in / Sign up

Export Citation Format

Share Document