scholarly journals Creating a learner corpus infrastructure: Experiences from making learner corpora available

2020 ◽  
Vol 33 ◽  
pp. 03006
Author(s):  
Jennifer-Carmen Frey ◽  
Alexander König ◽  
Darja Fišer

With language resources being collected in many - also small - projects in learner corpus research with considerate amounts of time and ef- fort spent in this activity, making these types of data available in a FAIR way, with standardized and reasoned methods, would contribute substan- tially to the advancement of the field. Additionally, it would answer current demands in transparency, replicability and reusability. In this article, we dis- cuss some of the challenges when making learner corpora FAIR and report from experiences in fulfilling this aim while creating a learner corpus infra- structure at a research institution hosting five different learner corpora.

Author(s):  
Deliang Man ◽  
Kok Yueh Lee ◽  
Meng Huat Chau ◽  
Esther Smidt

The advent of technology has facilitated the study of language development and writing development in the form of learner corpora. While learner corpus studies have flourished in recent years, few consider evaluative language development. This paper reports on a study which examines the use of evaluative that-clauses, a linguistic structure that is regularly used to express evaluation in academic writing, in a longitudinal corpus of 304 argumentative essays written by a group of undergraduate students at a university in Brunei. Results suggest students' dynamic use of language resources over time, and support the findings of previous research on the use of evaluative that-clauses by undergraduate students in other contexts of learning. This study, based on an approach to treating learner language in its own right, contributes to the understanding of the nature of language development. Implications for language teaching, including a revised role for teacher feedback and the use of longitudinal learner corpora for students' learning, are considered.


2016 ◽  
Vol 9 (9) ◽  
pp. 139 ◽  
Author(s):  
Katsunori Kotani ◽  
Takehiko Yoshimi ◽  
Hiroaki Nanjo ◽  
Hitoshi Isahara

<p>In order to develop effective teaching methods and computer-assisted language teaching systems for learners of English as a foreign language who need to study the basic linguistic competences for writing, pronunciation, reading, and listening, it is necessary to first investigate which vocabulary and grammar they have or have not yet learned. Identifying such vocabulary and grammar requires a learner corpus for analyzing the accuracy and fluency of learners’ linguistic competences. However, it is difficult to use previous learner corpora for this purpose because they have not compiled all the types of linguistic data that we need. Therefore, this study aimed to solve this problem by designing and developing a new learner corpus that compiles linguistic data regarding the accuracy and fluency of the four basic linguistic competences of writing, pronunciation, reading, and listening. The reliability and validity of the learner corpus were partially confirmed, and practical application of the learner corpus is reported here as case studies.</p>


ReCALL ◽  
2014 ◽  
Vol 26 (2) ◽  
pp. 202-224 ◽  
Author(s):  
Elena Cotos

AbstractLearner corpora have become prominent in language teaching and learning, enhancing data-driven learning (DDL) pedagogy by promoting ‘learning driven data’ in the classroom. This study explores the potential of a local learner corpus by investigating the effects of two types of DDL activities, one relying on a native-speaker corpus (NSC) and the second combining native-speaker and learner corpora. Both types of activities aimed at improving second language writers’ knowledge of linking adverbials and were based on a preliminary analysis of adverbial use in the local learner corpus produced by 31 study participants. Quantitative and qualitative data, obtained from writing samples, pre/post-tests, and questionnaires, were converged through concurrent triangulation. The results showed an increase in frequency, diversity and accuracy in all participants’ use of adverbials, but more significant improvement was made by the students who were exposed to the corpus containing their own writing. The findings of this study are thus interpreted as suggestive that combining learner and native-speaker data is a feasible and effective practice, which can be readily integrated in DDL-based instruction with positive impact.


ICAME Journal ◽  
2014 ◽  
Vol 38 (1) ◽  
pp. 115-135 ◽  
Author(s):  
Ute Römer ◽  
Audrey Roberson ◽  
Matthew B. O’Donnell ◽  
Nick C. Ellis

Abstract This paper combines data from learner corpora and psycholinguistic experiments in an attempt to find out what advanced learners of English (first language backgrounds German and Spanish) know about a range of common verbargument constructions (VACs), such as the ‘V about n’ construction (e.g. she thinks about chocolate a lot). Learners’ dominant verb-VAC associations are examined based on evidence retrieved from the German and Spanish subcomponents of ICLE and LINDSEI and collected in lexical production tasks in which participants complete VAC frames (e.g. ‘he ___ about the...’) with verbs that may fill the blank (e.g. talked, thought, wondered). The paper compares findings from the different data sets and highlights the value of linking corpus and experimental evidence in studying linguistic phenomena


Author(s):  
Gyu-Ho Shin ◽  
Boo Kyung Jung

Abstract The present study aims to explore the applicability of automatic analysis to L2-Korean learner corpora, with a special focus on learners’ use of a clause-level construction. For this purpose, we investigate L1-Mandarin L2-Korean learners’ written production of two passive construction types in Korean – suffixal and periphrastic – by devising a pattern-extraction process through NLP techniques. We focus on reporting how the passive constructions are identified and extracted from learner writing automatically, given language-specific features involving the passive. A total of 72 essays were analysed by adapting an existing pipeline (developed by Shin, forthcoming), with enhanced tokenisation and annotation through manual revision of the data. Results showed that our automatic pattern-finder identified more instances than manual extraction for the suffixal passive and yielded a perfect match with manual extraction for the periphrastic passive. Implications of the findings are discussed in regard to strengths and drawbacks of the automatic analysis of learner writing, with suggestions for improving currently available tools for learner corpus research in Korean.


2019 ◽  
Vol 2 (3) ◽  
pp. p159
Author(s):  
Katerina Florou

The aim of this study is to compare various lexical structures between a learner corpus of students with Italian as a foreign language and a reference monolingual Italian corpus. More specifically, the first is a learner corpus (part of a wider learner corpus) comprised of Greek students studying Italian as a foreign language while the second is the CWIC reference corpus of native Italian speakers. The research findings help us explain the role of didactic material in comprehending linguistic structures that are found in informal letters/emails and, moreover, they provide us valuable information regarding the use of the same lexical structures by native speakers.


2021 ◽  
pp. 162-177
Author(s):  
Antra Kļavinska ◽  

Several text corpora have been created in Latvia, including learner corpora. One of the latest projects is the Latvian Language Learner Corpus (LaVA), which contains the works of international students studying in Latvian higher education institutions who are learning Latvian as a foreign language. The texts are morphologically tagged automatically, and learner errors are tagged manually. A sufficient scope of publications is available, which provides the theoretical basis for the creation of Latvian language learner corpora; however, there is a lack of studies or practical methodological guidelines concerning the opportunities for their application, and there is little data about the use of text corpora in language acquisition. The aim of this study is to explain from the theoretical perspective for what purposes learner corpus data may be used, as well as to illustrate the methodological groundwork with examples from the LaVA corpus. Analysis of theoretical literature has demonstrated the functions and meaning of learner corpora in research, and experience with the use of corpora in acquiring a foreign language has been analysed. Examples of the use of the LaVA corpus as a didactic resource have been prepared using Corpus Linguistics methods. The study was conducted within the state research programme project “The Latvian Language”. After studying the functions of learner corpora from the theoretical perspective, it was concluded that the target audience of the LaVA corpus mainly includes teachers of Latvian as a foreign language (LATS), authors of teaching materials, as well as Latvian language learners. To facilitate the use of the LaVA corpus, it is important to have basic knowledge of Corpus Linguistics, an understanding of the theory of language, as well as an understanding of foreign language teaching methodology. LATS teachers can use the LaVA corpus data in the creation of curricula and teaching materials, in the preparation of language proficiency tests, etc. Using the inductive approach in language acquisition, language learners can also become language researchers, can analyse the errors of other learners, etc. Undeniably, the LaVA corpus can be used in broader linguistic research, for example, in contrastive interlanguage analysis, comparing the data of language learners with the data of native speakers or the data of different groups of language learners.


2018 ◽  
Vol 32 (3) ◽  
pp. 326-361 ◽  
Author(s):  
Jun Gao ◽  
Haitao Liu

Abstract Learners’ thesauri do not simply offer an inventory of semantically related lexical items but explicate their nuances and furnish users with rich syntactic, semantic, and pragmatic information. Adopting the theoretical framework of valency, this study examines the distinctive features of two English learners’ thesauri, the Oxford Learner’s Thesaurus: A Dictionary of Synonyms (OLT) and the Longman Language Activator (LLA). Furthermore, the study, supported by learner corpus evidence, empirically assesses the usefulness of OLT and LLA in Chinese learners’ writing. The results demonstrate that learners’ thesauri can generally meet the practical needs of users in writing through providing a range of synonyms and syntactic patterns, including abundant information on semantic collocations, and offering rich pragmatic information regarding registers and emotive variables. The results also show some defects in OLT and LLA, such as their failure to present specific syntactic patterns, including those frequently used in Chinese learners’ compositions. It is then suggested that the compilation of learners’ thesauri draw upon the ways in which lexical information is presented in the English Valency Dictionary, and that learner corpora and native speaker corpora be combined to improve their usefulness.


2012 ◽  
Vol 32 ◽  
pp. 130-149 ◽  
Author(s):  
Magali Paquot ◽  
Sylviane Granger

Formulaic language is at the heart of corpus linguistic research, and learner corpus research (LCR) is no exception. As multiword units of all kinds (e.g., collocations, phrasal verbs, speech formulae) are notoriously difficult for learners, and corpus linguistic techniques are an extremely powerful way of exploring them, they were an obvious area for investigation by researchers from the very early days of LCR. In the first part of this article, the focus is on the types of learner corpus data investigated and the most popular method used to analyze them. The second section describes the types of word sequences analyzed in learner corpora and the methodologies used to extract them. In the rest of the article, we summarize some of the main findings of LCR studies of the learner phrasicon, distinguishing between co-occurrence and recurrence. Particular emphasis is also placed on the relationship between learners’ use of formulaic sequences and transfer from the learner's first language. The article concludes with some proposals for future research in the field.


Sign in / Sign up

Export Citation Format

Share Document