scholarly journals ET: A Workstation for Querying, Editing and Evaluating Annotated Corpora

Author(s):  
Elvis de Souza ◽  
Cláudia Freitas
Keyword(s):  
Author(s):  
Ulrike Mosel

This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.


2010 ◽  
Vol 35-36 (2) ◽  
pp. 255-265 ◽  
Author(s):  
Amina Mettouchi ◽  
Christian Chanard
Keyword(s):  

2019 ◽  
Vol 55 (2) ◽  
pp. 239-269
Author(s):  
Michał Marcińczuk ◽  
Aleksander Wawer

Abstract In this article we discuss the current state-of-the-art for named entity recognition for Polish. We present publicly available resources and open-source tools for named entity recognition. The overview includes various kind of resources, i.e. guidelines, annotated corpora (NKJP, KPWr, CEN, PST) and lexicons (NELexiconS, PNET, Gazetteer). We present the major NER tools for Polish (Sprout, NERF, Liner2, Parallel LSTM-CRFs and PolDeepNer) and discuss their performance on the reference datasets. In the article we cover identification of named entity mentions in the running text, local and global entity categorization, fine- and coarse-grained categorization and lemmatization of proper names.


Sign in / Sign up

Export Citation Format

Share Document