Studies in Learner Corpus Linguistics

2021 ◽  
pp. 162-177
Author(s):  
Antra Kļavinska ◽  

Several text corpora have been created in Latvia, including learner corpora. One of the latest projects is the Latvian Language Learner Corpus (LaVA), which contains the works of international students studying in Latvian higher education institutions who are learning Latvian as a foreign language. The texts are morphologically tagged automatically, and learner errors are tagged manually. A sufficient scope of publications is available, which provides the theoretical basis for the creation of Latvian language learner corpora; however, there is a lack of studies or practical methodological guidelines concerning the opportunities for their application, and there is little data about the use of text corpora in language acquisition. The aim of this study is to explain from the theoretical perspective for what purposes learner corpus data may be used, as well as to illustrate the methodological groundwork with examples from the LaVA corpus. Analysis of theoretical literature has demonstrated the functions and meaning of learner corpora in research, and experience with the use of corpora in acquiring a foreign language has been analysed. Examples of the use of the LaVA corpus as a didactic resource have been prepared using Corpus Linguistics methods. The study was conducted within the state research programme project “The Latvian Language”. After studying the functions of learner corpora from the theoretical perspective, it was concluded that the target audience of the LaVA corpus mainly includes teachers of Latvian as a foreign language (LATS), authors of teaching materials, as well as Latvian language learners. To facilitate the use of the LaVA corpus, it is important to have basic knowledge of Corpus Linguistics, an understanding of the theory of language, as well as an understanding of foreign language teaching methodology. LATS teachers can use the LaVA corpus data in the creation of curricula and teaching materials, in the preparation of language proficiency tests, etc. Using the inductive approach in language acquisition, language learners can also become language researchers, can analyse the errors of other learners, etc. Undeniably, the LaVA corpus can be used in broader linguistic research, for example, in contrastive interlanguage analysis, comparing the data of language learners with the data of native speakers or the data of different groups of language learners.


Author(s):  
Barry Kavanagh

This study aims to explore potential reasons why the use of the tools and methods of corpus linguistics are not prevalent in English teaching in Norway, using the research question What do in-service English teachers in Norway find useful about corpora and what do they find challenging? The study provides interview data from in-service teachers, contributing to our understanding of the in-service perspective on corpora. The research design consists of teaching corpus use in seminars for in-service English teachers (featuring LancsLex, the concordancer AntConc and the OANC), integrated into a language course that is part of a further education programme, and semi-structured interviews with four of the students who took the course, during which they also interacted with Netspeak, SKELL and COCA. As with previous research, the in-service teachers found corpora particularly useful for teaching and learning vocabulary, and found challenges to use which are categorized here as usability (criticism of AntConc), IT challenges (a lack of IT skills among teachers), learner-corpus interaction challenges (the complexity of software and concordance lines for pupils; pupil uninterest in language), and lack of teacher need (mistakes being “obvious” to teachers in the lower years). The article discusses some implications of these findings. Keywords: English language teaching, pedagogical corpus application, corpora           


2018 ◽  
Vol 1 (2) ◽  
pp. 277-309 ◽  
Author(s):  
Stefan Th. Gries

Abstract This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by. In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate a much-needed discussion of currently lacking methodological sophistication.


2010 ◽  
Vol 1 ◽  
Author(s):  
John A. Hawkins ◽  
Paula Buttery

AbstractOne of the major goals of the Cambridge English Profile Programme is to identify ‘criterial features’ for each of the Common European Framework of Reference (CEFR) proficiency levels as they apply to English, and to assess the impact of different first languages on these features (through ‘transfer’ effects). The present paper defines what is meant by criterial features and proposes an initial taxonomy of four types. Numerous illustrations are given from our collaborative research to date on the Cambridge Learner Corpus. The benefits and challenges posed by these features for corpus linguistics and for theories of second language acquisition are briefly outlined, as are the benefits and challenges for language assessment practices and for publishing ventures that make use of them as supplements to the current CEFR descriptors.


2019 ◽  
Vol 39 ◽  
pp. 74-92 ◽  
Author(s):  
Tony McEnery ◽  
Vaclav Brezina ◽  
Dana Gablasova ◽  
Jayanti Banerjee

AbstractIn this article we explore the relationship between learner corpus and second language acquisition research. We begin by considering the origins of learner corpus research, noting its roots in smaller scale studies of learner language. This development of learner corpus studies is considered in the broader context of the development of corpus linguistics. We then consider the aspirations that learner corpus researchers have had to engage with second language acquisition research and explore why, to date, the interaction between the two fields has been minimal. By exploring some of the corpus building practices of learner corpus research, and the theoretical goals of second language acquisition studies, we identify reasons for this lack of interaction and make proposals for how this situation could be fruitfully addressed.


2021 ◽  
Vol 29 (2) ◽  
pp. 1443
Author(s):  
Rafaela Rigaud Peixoto ◽  
Patrícia Tosqui-Lucks

Abstract: Weather events affect air traffic control (ATC) in many ways, for there are many situations that need to be reported in pilot-controller communication. This paper attempts to analyze the language used to express the impact of meteorological phenomena to air traffic operations, particularly in regard to aeronautical English, that is, the communication used during radiotelephony by air traffic controllers in training situations. For that, two types of analyses will be carried out: one regarding the formulaic structure of lexical units using 11 Aeronautical Meteorology terms within the ATC context (phase 1); and another one concerning the use of these terms by students in three ATC courses (for TWR, ACC and APP facilities) and how it affects their performance during communication activities in a learning environment (phase 2). These analyses will be based on rationales of lexical semantics for terminology; corpus linguistics (CL), comprising English for Specific Purposes (ESP) and learner corpora; and considerations about vocabulary assessment on aeronautical English exams. Results suggest that terminological patterns discussed in this paper show how meaning is dependent on context, and how lexical semantic analysis of terms may contribute to reveal nuances of language used in a specialized context. In this way, it indicates courses have been efficient in teaching and practicing the use of the main meteorological terms related to aeronautical English and that, despite some mistakes students make, evidence points out that they are able to report weather conditions to pilots and to understand pilots’ requests in a proficient level concerning vocabulary.Keywords: meteorology; aeronautical English; terminology; learner corpus; language assessment.Resumo: Eventos meteorológicos afetam o controle de tráfego aéreo (ATC) de diversas formas, dado que muitas situações precisam ser reportadas na comunicação entre piloto e controlador. Este artigo pretende analisar a linguagem utilizada para expressar o impacto de fenômenos meteorológicos para operações ATC, particularmente quanto ao uso de inglês aeronáutico, ou seja, a comunicação utilizada durante a radiotelefonia, por controladores em situações de aprendizagem. Para isso, duas análises foram realizadas: em relação à estrutura formulaica de unidades lexicais contendo 11 termos de Meteorologia Aeronáutica no contexto ATC (fase 1); e quanto ao uso desses termos por alunos de três cursos ATC (para os órgãos operacionais TWR, ACC e APP) e como isso afeta seu desempenho durante as atividades de comunicação em um ambiente de aprendizagem (fase 2). Essas análises serão fundamentadas nas teorias de semântica lexical para terminologia; linguística de corpus (LC), compreendendo Inglês para Fins Específicos (ESP) e corpora de aprendizes; e considerações sobre avaliação de vocabulário em exames de proficiência de inglês aeronáutico. Os resultados sugerem que os padrões terminológicos discutidos mostram como os significados dependem do contexto, e como a análise léxico-semântica de termos pode contribuir para revelar nuances da linguagem utilizada em contexto especializado. Desta forma, demonstrou-se que os cursos foram eficientes no ensino e na prática do uso dos principais termos meteorológicos e que, apesar de alguns erros cometidos, as evidências apontam que os estudantes foram capazes de reportar condições meteorológicas e compreender as solicitações dos pilotos com nível de proficiência adequado em relação a vocabulário.Palavras-chave: meteorologia; inglês aeronáutico; terminologia; corpus de aprendizes; avaliação de línguas.


2017 ◽  
Vol 10 (2) ◽  
pp. 31-49
Author(s):  
Priscilla Tulipa da Costa

RESUMO: Este estudo tem como objetivo analisar o uso dos verbos frasais do inglês na escrita acadêmica de aprendizes brasileiros. Para tanto, dois corpora contendo ensaios escritos por estudantes universitários foram utilizados, sendo um para estudo (Br-ICLE) e outro para referência (LOCNESS). A metodologia, baseada na Linguística de Corpus, se compõe de exames quantitativos realizados com o suporte do software AntConc para o tratamento e a análise dos dados. Os resultados sugerem que, em relação a outros tipos de verbos multipalavras, os verbos frasais são pouco usados nos textos de aprendizes. Entretanto, nota-se também que algumas das estruturas verbo + partícula encontradas se tornaram características desse tipo de produção textual, o que indica que o seu uso é cada vez mais comum na escrita de caráter mais formal. Ademais, a investigação também apontou semelhanças e diferenças de uso entre os grupos examinados, e para a constatação de que nativos e não nativos utilizam verbos frasais iguais em proporções bem semelhantes, ainda que haja casos de uso em desacordo com os padrões da língua inglesa por parte dos alunos brasileiros.PALAVRAS-CHAVE: verbos frasais; corpus de aprendizes; escrita acadêmica; linguística de corpus.  ABSTRACT: This study aims at analyzing the use of English phrasal verbs in the academic writing of Brazilian learners. Therefore, two corpora containing essays written by college students were used: one as the study corpus (Br-ICLE), and the other as the reference corpus (LOCNESS). The methodology, which is based on Corpus Linguistics, consists of quantitative exams performed with the AntConc software support for the treatment and analysis of the data. The results suggested that, considering the other types of multi-word verbs, phrasal verbs are less used by learners in their essays. However, it is also noted that some of the verb + particle structures found have become typical of this type of textual production, which indicates that its use is increasingly common in a more formal writing. In addition, the research also pointed out similarities and differences in the use of phrasal verbs in both groups examined, as well as the finding that natives and non-natives use equal combinations in very similar proportions, although Brazilian students sometimes use phrasal verbs in disagreement with the English language standardsKEYWORDS: phrasal verbs; learner corpus; academic writing; corpus linguistics.


2018 ◽  
Vol 6 (5) ◽  
pp. 77 ◽  
Author(s):  
Cem Can

This paper illustrates the use of learner corpus data (extracted from Cambridge Learner Corpus – CLC) to carry out an error analysis to investigate authentic learner errors and their respective frequencies in terms of types and tokens as well as contexts in which they regularly occur across four distinct proficiency levels, B1-B2; C1-C2, as defined by Common European Framework of Reference for Languages (henceforth CEFR) (Council of Europe, 2001). As a variety of learner corpora compiled by researchers become relatively accessible, it is possible to explore interlanguage errors and conduct error analysis (EA) on learner-generated texts. The necessity to cogitate over these authentic learner errors in designing foreign language learning programs and remedial teaching materials has been widely emphasized by many researchers (see e.g., Juozulynas, 1994; Mitton, 1996; Cowan, Choi, & Kim, 2003; Ndiaye & Vandeventer Faltin, 2003; Allerton et al., 2004). This study aims at conducting a corpus-based error analysis of agreement errors to reveal the related error categories between Greek and Turkish EFL learners, the distribution of agreement errors along the B1 - C2 proficiency range according to CEFR, and the distribution of agreement error types in respect of the L1 of the learners. The data analyzed in this study is extracted from the Cambridge Learner Corpus (CLC), the largest annotated test performance corpus which enables the investigation of the linguistic and rhetorical features of the learner performances in the above stated proficiency bands. The findings from this study reveal that, across B1-C2 proficiency levels and across different registers and genres, the most common agreement error categories by the frequency in which they occur are Verb Agreement (AGV), Noun Agreement (AGN), Anaphor Agreement (AGA), Determiner Agreement (AGD), Agreement Error (AG), and Quantifier Agreement (AGQ) errors. This study’s approach uses the techniques of computer corpus linguistics and follows the steps of the Error Analysis framework proposed by Corder (1971): identification, description, classification and explanation of errors.


EduLingua ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 47-59
Author(s):  
József Horváth

Corpus linguistics studies have by now become a staple of linguists and teachers worldwide. Even practitioners who are not directly involved with corpus development or analysis are increasingly aware of this domain and its results. Thus, we can say that the time has come to investigate the long-term effects of the findings connected to corpus linguistics. This paper focuses on a specific sort of corpus: the learner corpus. It argues that what used to be a more traditional approach represented in the EFL (English as a foreign language) discipline has evolved into a perhaps more appropriate one represented in ELF (English as a lingua franca) partly because of the work of learner corpus research. To demonstrate any existing long-term effects of work with learner corpora on language education, an L2 corpus, the JPU Corpus, is presented. Five of the ten hypotheses originally set up in the early 2000s are revisited and critiqued by applying both quantitative and qualitative investigations. The results indicate that a diachronic learner corpus approach further establishes the shift from EFL to ELF approaches, a potentially useful and relevant change for students and their teachers across the world, especially within the framework of writing pedagogy.


Sign in / Sign up

Export Citation Format

Share Document