Exploiting native language interference for native language identification

2020 ◽  
pp. 1-31
Author(s):  
Ilia Markov ◽  
Vivi Nastase ◽  
Carlo Strapparava

Abstract Native language identification (NLI)—the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)—is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages’ structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.

2016 ◽  
Vol 42 (3) ◽  
pp. 491-525 ◽  
Author(s):  
Radu Tudor Ionescu ◽  
Marius Popescu ◽  
Aoife Cahill

The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. Despite the fact that the approach based on string kernels performs so well, several questions about this method remain unanswered. First, it is not clear why such a simple approach can compete with far more complex approaches that take words, lemmas, syntactic information, or even semantics into account. Second, although the approach is designed to be language independent, all experiments to date have been on English. This work is an extensive study that aims to systematically present the string kernel approach and to clarify the open questions mentioned above. A broad set of native language identification experiments were conducted to compare the string kernels approach with other state-of-the-art methods. The empirical results obtained in all of the experiments conducted in this work indicate that the proposed approach achieves state-of-the-art performance in NLI, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the results obtained on both the Arabic and the Norwegian corpora demonstrate that the proposed approach is language independent. In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far. The results of string kernels on Norwegian native language identification are also significantly better than the state-of-the-art approach. In addition, in a cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state-of-the-art system by 32.3%. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminating are analyzed in this work. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p-grams of various lengths. The features captured by the model typically include stems, function words, and word prefixes and suffixes, which have the potential to generalize over purely word-based features. By analyzing the discriminating features, this article offers insights into two kinds of language transfer effects, namely, word choice (lexical transfer) and morphological differences. The goal of the current study is to give a full view of the string kernels approach and shed some light on why this approach works so well.


2021 ◽  
Vol 54 (7) ◽  
pp. 1-39
Author(s):  
Ankur Lohachab ◽  
Saurabh Garg ◽  
Byeong Kang ◽  
Muhammad Bilal Amin ◽  
Junmin Lee ◽  
...  

Unprecedented attention towards blockchain technology is serving as a game-changer in fostering the development of blockchain-enabled distinctive frameworks. However, fragmentation unleashed by its underlying concepts hinders different stakeholders from effectively utilizing blockchain-supported services, resulting in the obstruction of its wide-scale adoption. To explore synergies among the isolated frameworks requires comprehensively studying inter-blockchain communication approaches. These approaches broadly come under the umbrella of Blockchain Interoperability (BI) notion, as it can facilitate a novel paradigm of an integrated blockchain ecosystem that connects state-of-the-art disparate blockchains. Currently, there is a lack of studies that comprehensively review BI, which works as a stumbling block in its development. Therefore, this article aims to articulate potential of BI by reviewing it from diverse perspectives. Beginning with a glance of blockchain architecture fundamentals, this article discusses its associated platforms, taxonomy, and consensus mechanisms. Subsequently, it argues about BI’s requirement by exemplifying its potential opportunities and application areas. Concerning BI, an architecture seems to be a missing link. Hence, this article introduces a layered architecture for the effective development of protocols and methods for interoperable blockchains. Furthermore, this article proposes an in-depth BI research taxonomy and provides an insight into the state-of-the-art projects. Finally, it determines possible open challenges and future research in the domain.


Author(s):  
Kristina Štrkalj Despot ◽  
Lana Hudeček ◽  
Tomislav Stojanov ◽  
Nikola Ljubešić

In this minireview, the state of the art of the Croatian monolingual lexicography is presented. A brief overview and classification of all existing lexicographic resources is provided in the firts part of the minireview, followed by somewhat more detailed insight into the existing Croatian monolingual dictionaries and monolingual lexicographic projects, orthography dictionaries, and dictionary writing systems used.


2021 ◽  
Vol X (3) ◽  
pp. 95-100
Author(s):  
Tamar Makharoblidze ◽  

As stated in the title, the paper is devoted to the issue of second language acquisition by Deaf people in Georgia, describing the current situation and the challenges. There are about 2500 Deaf and hard of hearing residents in Georgia. Being the linguistic minority in the country, these people communicate with each-other in the Georgian Sign Language – GESL. The second native language for local Deaf and hard of hearing people is the Georgian spoken language – the State language. In many countries Deaf people are bilingual, while it is hard to consider the local Deaf and hard of hearing people bilingual, as the knowledge of spoken Georgian on the level of a native language among the Deaf residents is not observed. Unfortunately in Georgia there are no studies concerning the second language acquisition for Deaf and hard of hearing people. The main problems are the agrammatism in written communication on the state language and the ignorance of deferent hierarchical levels of spoken Georgian. This short paper offers the key issues for the plan of strategy of spoken Georgian acquisition for local Deaf and hard of hearing residents.


2018 ◽  
Vol 51 (4) ◽  
pp. 553-566 ◽  
Author(s):  
Naoko Taguchi ◽  
Joseph Collentine

Isabelli-García, Bown, Plew & Dewey (forthcoming) presented the ‘state of the art’ in research on language learning abroad. Beginning with Carroll's (1967) claim that ‘time spent abroad is one of the most potent variables’ predicting second language (L2) abilities (p. 137), the scope of study-abroad research has grown multifold in guiding theoretical frameworks, empirical methods, and objects of examination. A half-century of work surveyed in Isabelli-García et al.’s review reveals diverse goals of investigation, ranging from studies focusing on documenting learning outcomes, to studies aiming to unveil the process and nature of learning in a study-abroad context.


Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 480 ◽  
Author(s):  
Andrea Ballo ◽  
Alfio Dario Grasso ◽  
Gaetano Palumbo

With the aim of providing designer guidelines for choosing the most suitable solution, according to the given design specifications, in this paper a review of charge pump (CP) topologies for the power management of Internet of Things (IoT) nodes is presented. Power management of IoT nodes represents a challenging task, especially when the output of the energy harvester is in the order of few hundreds of millivolts. In these applications, the power management section can be profitably implemented, exploiting CPs. Indeed, presently, many different CP topologies have been presented in literature. Finally, a data-driven comparison is also provided, allowing for quantitative insight into the state-of-the-art of integrated CPs.


2009 ◽  
Vol 31 (2) ◽  
pp. 291-321 ◽  
Author(s):  
Laurent Dekydtspotter

This article presents evidence that supports the claim that second language (L2) grammars arise in a domain-specific, informationally encapsulated module with contents provided by Universal Grammar and enriched by native language knowledge, as entertained by Schwartz (1986, 1987, 1999) contra Bley-Vroman (1990). I consider state-of-the-art evidence representative of a body of research on the poverty of the stimulus (POS) that argues for the domain-specificity of L2 representations, with a main focus on interpretation. Then I examine interpretive evidence relevant to the role of informational encapsulation and compositionality in SLA. I seek to demonstrate that the acquisition of syntax-linked interpretive properties where the POS is severe provides opportunities for a type of fingerprinting of mental organization that can inform a variety of epistemologically relevant questions.


2020 ◽  
Author(s):  
D. Michieletto

Systems of “living” polymers are ubiquitous in industry and are traditionally realised using surfactants. Here I review the state-of-the-art of living polymers and discuss non-equilibrium extensions that may be realised with advanced synthetic chemistry or DNA functionalised by proteins. These systems are not only interesting in order to realise novel “living” soft matter but can also shed insight into how genomes are (topologically) regulated in vivo.


Sign in / Sign up

Export Citation Format

Share Document