learner corpora
Recently Published Documents


TOTAL DOCUMENTS

183
(FIVE YEARS 43)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 60 (3) ◽  
pp. 762-775
Author(s):  
Anita Ferreira ◽  
Lorena Blanco ◽  
Jessica Elejalde
Keyword(s):  

RESUMEN La problemática que se aborda en el presente artículo se relaciona con las dificultades que se observan en el uso de la partícula como en aprendientes de ELE de nivel de competencia A2 y B1 con L1 francés, alemán e inglés. Para este fin, se realiza un estudio de Corpus de Aprendientes en formato digital (del inglés, Computer Learner Corpora, CLC). Las variables corresponden a la tendencia de uso de la partícula como según el nivel de competencia A2 y B1 y la lengua materna. A partir del corpus CAELE se ha constituido un conjunto de 1578 enunciados textuales digitales de aprendientes anglosajones, francófonos y alemanes de ELE. Los objetivos principales son (1) determinar los usos más frecuentes de la partícula como de acuerdo con los usos atributivo de ejemplificación, relativo modal y comparativo y (2) delimitar los usos más frecuentes de acuerdo con el nivel de competencia y lengua materna de los aprendientes en estudio”. Los resultados muestran un mayor uso de la partícula como en función de atributo ejemplificativo, seguido por el uso de relativo de modo y el comparativo de igualdad. En cuanto a la L1 y nivel de competencia los aprendientes franc y nivel de competencia ucturas en estudio e modo y comparativo de igualdad.arativo, mans se observa la mayor frecuencia de uso ófonos de B1 evidencian un mayor número de usos en las estructuras en estudio, seguidos de los alemanes de nivel A2 y de los anglosajones de nivel B1.


2021 ◽  
pp. 162-177
Author(s):  
Antra Kļavinska ◽  

Several text corpora have been created in Latvia, including learner corpora. One of the latest projects is the Latvian Language Learner Corpus (LaVA), which contains the works of international students studying in Latvian higher education institutions who are learning Latvian as a foreign language. The texts are morphologically tagged automatically, and learner errors are tagged manually. A sufficient scope of publications is available, which provides the theoretical basis for the creation of Latvian language learner corpora; however, there is a lack of studies or practical methodological guidelines concerning the opportunities for their application, and there is little data about the use of text corpora in language acquisition. The aim of this study is to explain from the theoretical perspective for what purposes learner corpus data may be used, as well as to illustrate the methodological groundwork with examples from the LaVA corpus. Analysis of theoretical literature has demonstrated the functions and meaning of learner corpora in research, and experience with the use of corpora in acquiring a foreign language has been analysed. Examples of the use of the LaVA corpus as a didactic resource have been prepared using Corpus Linguistics methods. The study was conducted within the state research programme project “The Latvian Language”. After studying the functions of learner corpora from the theoretical perspective, it was concluded that the target audience of the LaVA corpus mainly includes teachers of Latvian as a foreign language (LATS), authors of teaching materials, as well as Latvian language learners. To facilitate the use of the LaVA corpus, it is important to have basic knowledge of Corpus Linguistics, an understanding of the theory of language, as well as an understanding of foreign language teaching methodology. LATS teachers can use the LaVA corpus data in the creation of curricula and teaching materials, in the preparation of language proficiency tests, etc. Using the inductive approach in language acquisition, language learners can also become language researchers, can analyse the errors of other learners, etc. Undeniably, the LaVA corpus can be used in broader linguistic research, for example, in contrastive interlanguage analysis, comparing the data of language learners with the data of native speakers or the data of different groups of language learners.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Katrin Wisniewski

Abstract This contribution focuses on the use of the multifunctional German word form es in the learner corpora MERLIN and DISKO (1,452 texts; 3,700 manually annotated occurrences of es). These corpora cover a wide proficiency range (A1-C1), and they include an L1 control group. Due to its multiple functions, using es is assumed to be challenging for learners. After laying out its main functional features, this paper first addresses the question of whether the frequency patterns of es actually differ between L1 und L2 texts, which is shown to be true only for beginning learners, and whether differences related to learners’ L1 can be observed, which seems to be the case. Secondly, the study links the emerging use of different es types and their relative frequencies to CEFR proficiency levels. A third focus regards the accuracy of es usage, which is generally high but differs among the various es functions, with anaphoric es presenting the greatest challenge for learners. A closer look at interlanguage structures reveals that learners often omit compulsory es and that they use redundant es in peculiar syntactic slots. Furthermore, the use of anaphoric es without clear textual reference regularly encumbers the reading process of the texts.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 199
Author(s):  
Alexander König ◽  
Jennifer-Carmen Frey ◽  
Egon W. Stemle

Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure for L1 and L2 learner corpora and populating it with data collected in the past. We embed our infrastructure efforts in the broader field of infrastructures for scientific research, drawing from technical solutions and frameworks from research data management, among which the FAIR guiding principles for data stewardship. We share our experiences from integrating some L1 and L2 learner corpora from concluded projects into the infrastructure while trying to ensure compliance with the FAIR principles and the standards we established for reproducibility, discussing how far research data that has been collected in the past can be made comparable, reusable and reproducible. Our results show that some basic needs for providing comparable and reusable data are covered by existing general infrastructure solutions and can be exploited for domain-specific infrastructures such as the one presented in this article. Other aspects need genuinely domain-driven approaches. The solutions found for the corpora in the presented infrastructure can only be a preliminary attempt, and further community involvement would be needed to provide templates and models acknowledged and promoted by the community. Furthermore, forward-looking data management would be needed starting from the beginning of new corpus creation projects to ensure that all requirements for FAIR data can be met.


Sign in / Sign up

Export Citation Format

Share Document