Syntactic Annotation

Author(s):  
Niladri Sekhar Dash
Keyword(s):  
2020 ◽  
Vol 8 (2) ◽  
pp. 133-158
Author(s):  
María José López-Couso ◽  
Belén Méndez-Naya

This article discusses some of the potential problems derived from the syntactic annotation of historical corpora, especially in connection with low-frequency phenomena. By way of illustration, we examine the parsing scheme used in the Penn Parsed Corpora of Historical English (PPCHE) for clauses introduced by so-called ‘minor declarative complementizers’, originally adverbial links which come to be occasionally used in complementizer function. We show that the functional similarities between canonical declarative complement clauses introduced by the major declarative links that and zero and those headed by minor declarative complementizers are not captured by the PPCHE parsing, where the latter constructions are not tagged as complement clauses, but rather as adverbial clauses. The examples discussed reveal that, despite the obvious advantages of parsed corpora, annotation may sometimes mask interesting linguistic facts.


Author(s):  
Larraitz Uria ◽  
Ainara Estarrona ◽  
Izaskun Aldezabal ◽  
Maria Jesús Aranzabe ◽  
Arantza Díaz de Ilarraza ◽  
...  

Author(s):  
Ulrike Mosel

This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.


2018 ◽  
Vol 23 (1) ◽  
pp. 28-54 ◽  
Author(s):  
Yan Huang ◽  
Akira Murakami ◽  
Theodora Alexopoulou ◽  
Anna Korhonen

Abstract Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.


2003 ◽  
Vol 29 (1) ◽  
pp. 73-96 ◽  
Author(s):  
Derrick Higgins ◽  
Jerrold M. Sadock

This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope.


Author(s):  
Thorsten Brants ◽  
Wojciech Skut ◽  
Hans Uszkoreit

Sign in / Sign up

Export Citation Format

Share Document