scholarly journals NATIVE LANGUAGE IDENTIFICATION FOR RUSSIAN USING ERRORS TYPES

Author(s):  
N. V. Remnev ◽  

The task of recognizing the author’s native (Native Language Identification—NLI) language based on a texts, written in a language that is non-native to the author—is the task of automatically recognizing native language (L1). The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 and 2017, where TOEFL English essays and essay samples were used as data. There is also a small number of works where the NLI problem was solved for other languages. The NLI problem was investigated for Russian by Ladygina (2017) and Remnev (2019). This paper discusses the use of well-established approaches in the NLI Shared Task 2013 and 2017 competitions to solve the problem of recognizing the author’s native language, as well as to recognize the type of speaker—learners of Russian or Heritage Russian speakers. Native language identification task is also solved based on the types of errors specific to different languages. This study is data-driven and is possible thanks to the Russian Learner Corpus developed by the Higher School of Economics (HSE) Learner Russian Research Group on the basis of which experiments are being conducted.

2017 ◽  
Author(s):  
Shervin Malmasi ◽  
Keelan Evanini ◽  
Aoife Cahill ◽  
Joel Tetreault ◽  
Robert Pugh ◽  
...  

Author(s):  
Anand Kumar M. ◽  
Shivkaran Singh ◽  
Praveena Ramanan ◽  
Vaithehi Sinthiya ◽  
Soman K. P.

In recent times, paraphrase identification task has got the attention of the research community. The paraphrase is a phrase or sentence that conveys the same information but using different words or syntactic structure. The Microsoft Research Paraphrase Corpus (MSRP) is a well-known openly available paraphrase corpus of the English language. There is no such publicly available paraphrase corpus for any Indian language (as of now). This chapter explains the creation of paraphrase corpus for Hindi, Tamil, Malayalam, and Punjabi languages. This is the first publicly available corpus for any Indian language. It was used in the shared task on detecting paraphrases for Indian languages (DPIL) held in conjunction with Forum for Information Retrieval & Evaluation (FIRE) 2016. The annotation process was performed by a postgraduate student followed by a two-step proofreading by a linguist and a language expert.


2020 ◽  
Vol 18 (4) ◽  
pp. 395-405
Author(s):  
Evgeniya V. Aleshniskaya

The paper considers translation as an intermediate stage in the creation of English-language song lyrics by native Russian speakers. Russian songwriters quite often rely on their native language and translate their thoughts from Russian into English. This leads to the use of a “russified” variety of English, which performs poetic and pragmatic functions and serves as a medium harmonizing content, sound, and music. Drawing evidence from 214 songs in various musical genres, as well as 10 ethnographic interviews with Russian songwriters, it examines the specific features of the Russian variety of English used in song lyrics, and discusses the main views on the authenticity of translation in song lyrics depending on the musical genre.


2012 ◽  
Vol 17 (3) ◽  
pp. 190-198 ◽  
Author(s):  
Günter Krampen ◽  
Thomas Huckert ◽  
Gabriel Schui

Exemplary for other than English-language psychology journals, the impact of recent Anglicization of five former German-language psychology journals on (1) authorship (nationality, i.e., native language, and number of authors, i.e., single or multiple authorships), (2) formal characteristics of the journal (number of articles per volume and length of articles), and (3) number of citations of the articles in other journal articles, the language of the citing publications, and the impact factors (IF) is analyzed. Scientometric data on these variables are gathered for all articles published in the four years before anglicizing and in the four years after anglicizing the same journal. Results reveal rather quick changes: Citations per year since original articles’ publication increase significantly, and the IF of the journals go up markedly. Frequencies of citing in German-language journals decrease, citing in English-language journals increase significantly after the Anglicization of former German-language psychology journals, and there is a general trend of increasing citations in other languages as well. Side effects of anglicizing former German-language psychology journals include the publication of shorter papers, their availability to a more international authorship, and a slight, but significant increase in multiple authorships.


2015 ◽  
Author(s):  
Shervin Malmasi ◽  
Joel Tetreault ◽  
Mark Dras

Multilingua ◽  
2018 ◽  
Vol 37 (3) ◽  
pp. 275-304 ◽  
Author(s):  
Jette G. Hansen Edwards

AbstractThe study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students’ English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants’ English and Cantonese language use at home, school, and with peers/friends. Leung, Harris, and Rampton’s (1997, The idealized native speaker, reified ethnicities, and classroom realities.TESOL Quarterly 31(3). 543–560) framework of language affiliation, language expertise, and inheritance was used to examine the construction of a native language identity in a multilingual setting. The study found that educational background – and particularly international school experience in contrast to local government school education – had an impact on the participants’ English language usage at home and with peers, and also affected their language expertise in Cantonese. English language use at school also impacted their identifications as native speakers of both Cantonese and English, with Cantonese being viewed largely as native language based on inheritance while English was being defined as native based on their language expertise, affiliation and use, particularly in contrast to their expertise in, affiliation with, and use of Cantonese.


2015 ◽  
Vol 1 (2) ◽  
pp. 187-209 ◽  
Author(s):  
Kristopher Kyle ◽  
Scott A. Crossley ◽  
YouJin Kim

This study evaluates the impact of writing proficiency on native language identification (NLI), a topic that has important implications for the generalizability of NLI models and detection-based arguments for cross-linguistic influence (Jarvis 2010, 2012; CLI). The study uses multinomial logistic regression to classify the first language (L1) group membership of essays at two proficiency levels based on systematic lexical and phrasal choices made by members of five L1 groups. The results indicate that lower proficiency essays are significantly easier to classify than higher proficiency essays, suggesting that lower proficiency writers make lexical and phrasal choices that are more similar to other lower proficiency writers that share an L1 than higher proficiency writers that share an L1. A close analysis of the findings also indicates that the relationship between NLI accuracy and proficiency differed across L1 groups.


2020 ◽  
pp. 1-31
Author(s):  
Ilia Markov ◽  
Vivi Nastase ◽  
Carlo Strapparava

Abstract Native language identification (NLI)—the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)—is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages’ structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.


Sign in / Sign up

Export Citation Format

Share Document