Creating a grammar checker for CALL by constraint relaxation: 
a feasibility study

Intelligent feedback on learners’ full written sentence productions requires the use of Natural Language Processing (NLP) tools and, in particular, of a diagnosis system. Most syntactic parsers, on which grammar checkers are based, are designed to parse grammatical sentences and/or native speaker productions. They are therefore not necessarily suitable for language learners. In this paper, we concentrate on the transformation of a French syntactic parser into a grammar checker geared towards intermediate to advanced learners of French. Several techniques are envisaged to allow the parser to handle ill-formed input, including constraint relaxation. By the very nature of this technique, parsers can generate complete analyses for ungrammatical sentences. Proper labelling of where the analysis has been able to proceed thanks to a specific constraint relaxation forms the basis of the error diagnosis. Parsers with relaxed constraints tend to produce more complete, although incorrect, analyses for grammatical sentences, and several complete analyses for ungrammatical sentences. This increased number of analyses per sentence has one major drawback: it slows down the system and requires more memory. An experiment was conducted to observe the behaviour of our parser in the context of constraint relaxation. Three specific constraints, agreement in number, gender, and person, were selected and relaxed in different combinations. A learner corpus was parsed with each combination. The evolution of the number of correct diagnoses and of parsing speed, among other factors, were monitored. We then evaluated, by comparing the results, whether large scale constraint relaxation is a viable option to transform our syntactic parser into an efficient grammar checker for CALL.

Download Full-text

Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques

Language Learning ◽

10.1111/lang.12232 ◽

2017 ◽

Vol 67 (S1) ◽

pp. 180-208 ◽

Cited By ~ 33

Author(s):

Theodora Alexopoulou ◽

Marije Michel ◽

Akira Murakami ◽

Detmar Meurers

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Corpus Analysis ◽

Linguistic Complexity ◽

Learner Corpus ◽

Task Effects ◽

Learner Corpus Analysis ◽

Processing Techniques

Download Full-text

Japanese Learning Support Systems: Hinoki Project Report

Acta Linguistica Asiatica ◽

10.4312/ala.2.3.95-124 ◽

2012 ◽

Vol 2 (3) ◽

pp. 95-124 ◽

Cited By ~ 1

Author(s):

Bor HODOŠČEK ◽

Kikuko NISHINA

Keyword(s):

Language Learning ◽

Language Learners ◽

Language Processing ◽

Large Scale ◽

Assistance System ◽

Computer Assisted ◽

Processing Technologies ◽

Learner Corpus ◽

Linguistic Resources ◽

Search Feature

In this report, we introduce the Hinoki project, which set out to develop web-based Computer-Assisted Language Learning (CALL) systems for Japanese language learners more than a decade ago. Utilizing Natural Language Processing technologies and other linguistic resources, the project has come to encompass three systems, two corpora and many other resources. Beginning with the reading assistance system Asunaro, we describe the construction of Asunaro's multilingual dictionary and it's dependency grammar-based approach to reading assistance. The second system, Natsume, is a writing assistance system that uses large-scale corpora to provide an easy to use collocation search feature that is interesting for it's inclusion of the concept of genre. The final system, Nutmeg, is an extension of Natsume and the Natane learner corpus. It provides automatic correction of learners errors in compositions by using Natsume for its large corpus and genre-aware collocation data and Natane for its data on learner errors.

Download Full-text

Machine-learning as a validated tool to characterize individual differences in free recall of naturalistic events.

10.31234/osf.io/uygzv ◽

2021 ◽

Author(s):

Xinxu Shen ◽

Troy Houser ◽

David Victor Smith ◽

Vishnu P. Murty

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Individual Difference ◽

Language Processing ◽

Large Scale ◽

High Reliability ◽

Difference Analysis ◽

Universal Sentence ◽

Natural Language Processing Tool

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.

Download Full-text

The Experience of Developing a Large-Scale Natural Language Processing System: Critique

The Kluwer International Series in Engineering and Computer Science - Natural Language Processing: The PLNLP Approach ◽

10.1007/978-1-4615-3170-8_7 ◽

1993 ◽

pp. 77-89 ◽

Cited By ~ 2

Author(s):

Stephen Richardson ◽

Lisa Braden-Harder

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Processing System ◽

Natural Language Processing System

Download Full-text

Designing and Validating an Annotation Model of Dynamic Modality for English and Spanish: Issues and Problems

10.29007/pc58 ◽

2018 ◽

Author(s):

Julia Lavid ◽

Marta Carretero ◽

Juan Rafael Zamorano

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Reliability Study ◽

Annotation Scheme ◽

High Degree ◽

Difficult Cases

In this paper we set forth an annotation model for dynamic modality in English and Spanish, given its relevance not only for contrastive linguistic purposes, but also for its impact on practical annotation tasks in the Natural Language Processing (NLP) community. An annotation scheme is proposed, which captures both the functional-semantic meanings and the language-specific realisations of dynamic meanings in both languages. The scheme is validated through a reliability study performed on a randomly selected set of one hundred and twenty sentences from the MULTINOT corpus, resulting in a high degree of inter-annotator agreement. We discuss our main findings and give attention to the difficult cases as they are currently being used to develop detailed guidelines for the large-scale annotation of dynamic modality in English and Spanish.

Download Full-text

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805007 ◽

2018 ◽

pp. 13-17

Author(s):

Kaan Ant ◽

Ugur Sogukpinar ◽

Mehmet Fatif Amasyali

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Semantic Relations ◽

Template Method ◽

Semantic Relationships ◽

Semantic Spaces

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Download Full-text

Natural Language Processing in Large-Scale Neural Models for Medical Screenings

Frontiers in Robotics and AI ◽

10.3389/frobt.2019.00062 ◽

2019 ◽

Vol 6 ◽

Cited By ~ 1

Author(s):

Catharina Marie Stille ◽

Trevor Bekolay ◽

Peter Blouw ◽

Bernd J. Kröger

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Neural Models

Download Full-text

YouTube as a Source of Information in Understanding Autonomous Vehicle Consumers: Natural Language Processing Study

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119842110 ◽

2019 ◽

Vol 2673 (8) ◽

pp. 242-253 ◽

Cited By ~ 5

Author(s):

Subasish Das ◽

Anandi Dutta ◽

Tomas Lindheimer ◽

Mohammad Jalayer ◽

Zachary Elgart

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automotive Industry ◽

Autonomous Vehicles ◽

Large Scale ◽

Keyword Search ◽

Autonomous Vehicle ◽

Perception Of Safety ◽

Automation Level

The automotive industry is currently experiencing a revolution with the advent and deployment of autonomous vehicles. Several countries are conducting large-scale testing of autonomous vehicles on private and even public roads. It is important to examine the attitudes and potential concerns of end users towards autonomous cars before mass deployment. To facilitate the transition to autonomous vehicles, the automotive industry produces many videos on its products and technologies. The largest video sharing website, YouTube.com, hosts many videos on autonomous vehicle technology. Content analysis and text mining of the comments related to the videos with large numbers of views can provide insight about potential end-user feedback. This study examines two questions: first, how do people view autonomous vehicles? Second, what polarities exist regarding (a) content and (b) automation level? The researchers found 107 videos on YouTube using a related keyword search and examined comments on the 15 most-viewed videos, which had a total of 60.9 million views and around 25,000 comments. The videos were manually clustered based on their content and automation level. This study used two natural language processing (NLP) tools to perform knowledge discovery from a bag of approximately seven million words. The key issues in the comment threads were mostly associated with efficiency, performance, trust, comfort, and safety. The perception of safety and risk increased in the textual contents when videos presented full automation level. Sentiment analysis shows mixed sentiments towards autonomous vehicle technologies, however, the positive sentiments were higher than the negative.

Download Full-text

Subcategorization frame identification for learner English

International Journal of Corpus Linguistics ◽

10.1075/ijcl.18097.hua ◽

2020 ◽

Author(s):

Yan Huang ◽

Akira Murakami ◽

Theodora Alexopoulou ◽

Anna Korhonen

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Processing ◽

Large Scale ◽

Structural Information ◽

Syntactic Structures ◽

Verb Classes ◽

Learner Corpus ◽

Wide Range ◽

Learner Language

Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.

Download Full-text

Natural Language Processing for Learner Corpus Research (NLP for LCR)

International Journal of Learner Corpus Research ◽

10.1075/ijlcr.7.1 ◽

2021 ◽

Vol 7 (1) ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learner Corpus

Download Full-text