linguistic annotation
Recently Published Documents


TOTAL DOCUMENTS

97
(FIVE YEARS 17)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 72 (2) ◽  
pp. 590-602
Author(s):  
Kirill I. Semenov ◽  
Armine K. Titizian ◽  
Aleksandra O. Piskunova ◽  
Yulia O. Korotkova ◽  
Alena D. Tsvetkova ◽  
...  

Abstract The article tackles the problems of linguistic annotation in the Chinese texts presented in the Ruzhcorp – Russian-Chinese Parallel Corpus of RNC, and the ways to solve them. Particular attention is paid to the processing of Russian loanwords. On the one hand, we present the theoretical comparison of the widespread standards of Chinese text processing. On the other hand, we describe our experiments in three fields: word segmentation, grapheme-to-phoneme conversion, and PoS-tagging, on the specific corpus data that contains many transliterations and loanwords. As a result, we propose the preprocessing pipeline of the Chinese texts, that will be implemented in Ruzhcorp.


2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-16
Author(s):  
Federico Bonetti ◽  
Sara Tonelli

Gamification has been recently growing in popularity among researchers investigating Information and Communication Technologies. Scholars have been trying to take advantage of this approach in the field of natural language processing (NLP), developing Games With A Purpose (GWAPs) for corpus annotation that have obtained encouraging results both in annotation quality and overall cost. However, GWAPs implement gamification in different ways and to different degrees. We propose a new framework to investigate the mechanics employed in the gamification process and their magnitude in terms of complexity. This framework is based on an analysis of some of the most important contributions in the field of NLP-related gamified applications and GWAP theory. Its primary purpose is to provide a first step towards classifying mechanics that mimic mainstream video games and may require skills that are not relevant to the annotation task, defined as orthogonal mechanics. In order to test our framework, we develop and evaluate Spacewords, a linguistic space game for synonymy annotation.


Author(s):  
Adriana Picoral ◽  
Shelley Staples ◽  
Randi Reppen

Abstract This paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.


2021 ◽  
Author(s):  
Ishani Mondal ◽  
Kalika Bali ◽  
Mohit Jain ◽  
Monojit Choudhury ◽  
Ashish Sharma ◽  
...  

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Jacqueline Serigos

AbstractTraditionally, automated methods for loanword detection have not received an abundance of attention within the field of language contact. However, as research on loanwords has begun utilizing corpora with word counts in the millions, these generous quantities of data pose challenges for traditional methods of linguistic annotation. This paper presents a method for automatically detecting anglicisms within Spanish text and presents a case study, applying this method to explore the social stratification of anglicisms in Argentine media. The findings of the case study suggest that anglicisms may function as prestige markers in Argentina, which may be a logical consequence of the mode of contact: those of upper socio-economic status have greater access to outlets where loanwords seem to emerge, such as the media, Internet, and second language education.


Author(s):  
Ana Salgado ◽  
Rute Costa

This paper presents the Digital Edition of the Vocabularies of the Academy of Sciences project, which aims to digitise the spelling vocabularies of the Lisbon Academy of Sciences (ACL) in order to create a digital lexicographic corpus bringing together the printed versions of all these lexicographical reference works – the 1940, 1947, 1970, and finally the 2012 editions. The first stage started with the Vocabulário Ortográfico da Língua Portuguesa [Orthographic Vocabulary of the Portuguese Language] (VOLP-1940), our case study. After digitising this vocabulary, the work described here focuses on the linguistic annotation of VOLP-1940 using eXtensible Markup Language (XML), an annotation metalanguage, and following the annotation directives of the Text Encoding Initiative (TEI), more specifically the application of TEI Lex-0, a new TEI sub-format. We aim to highlight the need for rigorous linguistic data processing in the creation of new lexical resources to increase the quality of their description and applicability.


Sign in / Sign up

Export Citation Format

Share Document