Subcategorization frame identification for learner English

Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.

Download Full-text

Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness

ReCALL ◽

10.1017/s0958344007000237 ◽

2007 ◽

Vol 19 (3) ◽

pp. 252-268 ◽

Cited By ~ 22

Author(s):

Sylviane Granger ◽

Olivier Kraif ◽

Claude Ponton ◽

Georges Antoniadis ◽

Virginie Zampa

Keyword(s):

Language Learners ◽

Language Processing ◽

Error Detection ◽

Mother Tongue ◽

Learner Corpus ◽

Pos Tagging ◽

Foreign Language Learners ◽

Learner Corpora ◽

Wide Range ◽

Learner Language

AbstractLearner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with web-based interfaces which provide flexible access to error-tagged corpora that have been enhanced with simple NLP techniques such as POS-tagging or lemmatization and linked to a wide range of learner and task variables such as mother tongue background or activity type. This new resource is of interest to three main types of users: teachers wishing to prepare pedagogical materials that target learners' attested difficulties; learners themselves for editing or language awareness purposes and NLP researchers, for whom it serves as a benchmark for testing automatic error detection systems.

Download Full-text

Corpora and L2 acquisition: the L1 Portuguese – L2 Spanish subcorpus of CEDEL2

Revista da Associação Portuguesa de Linguística ◽

10.26334/2183-9077/rapln8ano2021a10 ◽

2021 ◽

pp. 121-136

Author(s):

Cristóbal Lozano ◽

Joana Teixeira ◽

Ana Madeira

Keyword(s):

Second Language Acquisition ◽

Large Scale ◽

Web Interface ◽

L2 Acquisition ◽

Total Size ◽

Learner Corpus ◽

Corpus Design ◽

Wide Range ◽

Relevant Variables ◽

L2 Spanish

This paper presents the L1 Portuguese – L2 Spanish subcorpus of Corpus Escrito del Español L2 (CEDEL2), a new methodological resource for second language acquisition (SLA) research, which is freely searchable and downloadable (http://cedel2.learnercorpora.com). CEDEL2 is a large-scale, multi-L1 learner corpus of L2 Spanish which contains written productions from learners at all proficiency levels as well as 6 native control subcorpora (total size: over 1,100,000 words from over 4,000 participants). CEDEL2 follows strict corpus design criteria (Sinclair, 2005) and learner corpus design recommendations (Tracy-Ventura & Paquot, 2021a). In its current version (CEDEL2 v. 2), its Portuguese component includes an L1 Portuguese – L2 Spanish subcorpus, with 21,662 words written by 164 participants, and an L1 Portuguese native subcorpus, with 3,500 words from 16 L1 speakers of European Portuguese. Thanks to their design features (e.g., same design across subcorpora, inclusion of metadata about SLA-relevant variables, dual native control subcorpora) and freely available web interface, CEDEL2 and its Portuguese subcorpora allow researchers to investigate a wide range of topics in SLA.

Download Full-text

Corpus Linguistics, Learner Corpora, and SLA: Employing Technology to Analyze Language Use

Annual Review of Applied Linguistics ◽

10.1017/s0267190519000096 ◽

2019 ◽

Vol 39 ◽

pp. 74-92 ◽

Cited By ~ 6

Author(s):

Tony McEnery ◽

Vaclav Brezina ◽

Dana Gablasova ◽

Jayanti Banerjee

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Corpus Linguistics ◽

Language Use ◽

Learner Corpus ◽

Learner Corpora ◽

Corpus Studies ◽

Learner Language ◽

The Relationship

AbstractIn this article we explore the relationship between learner corpus and second language acquisition research. We begin by considering the origins of learner corpus research, noting its roots in smaller scale studies of learner language. This development of learner corpus studies is considered in the broader context of the development of corpus linguistics. We then consider the aspirations that learner corpus researchers have had to engage with second language acquisition research and explore why, to date, the interaction between the two fields has been minimal. By exploring some of the corpus building practices of learner corpus research, and the theoretical goals of second language acquisition studies, we identify reasons for this lack of interaction and make proposals for how this situation could be fruitfully addressed.

Download Full-text

The Major Developments of Learner Language From Second Language Acquisition to Learner Corpus Research

Redefining the Role of Language in a Globalized World - Advances in Linguistics and Communication Studies ◽

10.4018/978-1-7998-2831-0.ch010 ◽

2021 ◽

pp. 184-196

Author(s):

Aicha Rahal

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Standard Variety ◽

Learner Corpus ◽

Language Research ◽

Learner Language

Given the fact that there is a constant debate among monolinguists and pluralists, this chapter aims to explore the main developments in learner language. It focuses on the changes from second language research to learner corpus research. It is an attempt to present second language theories. Then, the chapter draws a particular attention to the limitations of second language acquisition. The discussion turns to learner corpus research to show how language changes from heterogeneinity to diversity. Language is no longer seen as monolithic entity or a standard variety but a multilingual entity.

Download Full-text

Commentary: Have Learner Corpus Research and Second Language Acquisition Finally Met?

Learner Corpus Research Meets Second Language Acquisition ◽

10.1017/9781108674577.012 ◽

2020 ◽

pp. 243-257

Author(s):

Sylviane Granger

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Learner Corpus

Download Full-text

Multi‐Word Expressions in Second Language Writing: A Large‐Scale Longitudinal Learner Corpus Study

Language Learning ◽

10.1111/lang.12383 ◽

2019 ◽

Vol 70 (2) ◽

pp. 420-463 ◽

Cited By ~ 1

Author(s):

Anna Siyanova‐Chanturia ◽

Stefania Spina

Keyword(s):

Second Language ◽

Large Scale ◽

Second Language Writing ◽

Corpus Study ◽

Learner Corpus ◽

Language Writing

Download Full-text

INTERFERENCE AND NATURAL LANGUAGE PROCESSING IN SECOND LANGUAGE ACQUISITION

Language Learning ◽

10.1111/j.1467-1770.1983.tb00986.x ◽

1983 ◽

Vol 33 (1) ◽

pp. 55-76 ◽

Cited By ~ 35

Author(s):

Fernando Tarallo ◽

John Myhill

Keyword(s):

Second Language ◽

Natural Language Processing ◽

Second Language Acquisition ◽

Language Acquisition ◽

Natural Language ◽

Language Processing

Download Full-text

Managing Second Language Acquisition Data with Natural Language Processing Tools

10.7551/mitpress/12200.003.0039 ◽

2022 ◽

Keyword(s):

Second Language ◽

Natural Language Processing ◽

Second Language Acquisition ◽

Language Acquisition ◽

Natural Language ◽

Language Processing

Download Full-text

Multi-Word Expressions in Second Language Writing: A Large-Scale Longitudinal Learner Corpus Study

10.26686/wgtn.13670590 ◽

2021 ◽

Author(s):

Anna Siyanova ◽

S Spina

Keyword(s):

Second Language ◽

Language Learning ◽

Large Scale ◽

Second Language Writing ◽

Low Frequency ◽

University Of Michigan ◽

Corpus Study ◽

Learner Corpus ◽

Language Writing ◽

Complex Picture

© 2019 Language Learning Research Club, University of Michigan In the present study, we sought to advance the field of learner corpus research by tracking the development of phrasal vocabulary in essays produced at two different points in time. To this aim, we employed a large pool of second language (L2) learners (N = 175) from three proficiency levels—beginner, elementary, and intermediate—and focused on an underrepresented L2 (Italian). Employing mixed-effects models, a flexible and powerful tool for corpus data analysis, we analyzed learner combinations in terms of five different measures: phrase frequency, mutual information, lexical gravity, delta Pforward, and delta Pbackward. Our findings suggest a complex picture, in which higher proficiency and greater exposure to the L2 do not result in more idiomatic and targetlike output, and may, in fact, result in greater reliance on low frequency combinations whose constituent words are non-associated or mutually attracted.

Download Full-text

Review of Early instructed second language acquisition: Pathways to competence; Editors: Joanna Rokita-Jaśkow, Melanie Ellis; Publisher: Multilingual Matters, 2019; ISBN: 9781788922494; Pages: 257

Studies in Second Language Learning and Teaching ◽

10.14746/ssllt.2019.9.4.8 ◽

2019 ◽

Vol 9 (4) ◽

pp. 737-744

Author(s):

Paweł Scheffler

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Large Scale ◽

Special Educational Needs ◽

Teachers Perceptions ◽

Instructed Second Language Acquisition ◽

Key Issues ◽

Large Scale Survey ◽

Teaching Grammar

In a large scale survey of teachers’ perceptions of the challenges they face in teaching English to young primary school learners (Copland, Garton, & Burns, 2014), some of the key issues that are identified are as follows: teaching speaking, using only English in the classroom, enhancing motivation, maintaining discipline, catering for different individual needs (including special educational needs), dealing with parents, and teaching grammar as well as reading and writing. The relevance of Early Instructed Second Language Acquisition, edited by Rokita-Jaśkow and Ellis, is clearly shown by the fact that it addresses most of these central issues.

Download Full-text