Associating Automatic Natural Language Processing to Serious Games and Virtual Worlds

Treveur Bretaudière; Samuel Cruz-Lara; Lina María Rojas Barahona

doi:10.4101/jvwr.v4i3.6124

Associating Automatic Natural Language Processing to Serious Games and Virtual Worlds

Journal of Virtual Worlds Research ◽

10.4101/jvwr.v4i3.6124 ◽

2011 ◽

Vol 4 (3) ◽

Cited By ~ 2

Author(s):

Treveur Bretaudière ◽

Samuel Cruz-Lara ◽

Lina María Rojas Barahona

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Virtual Worlds ◽

Serious Games ◽

Emotion Detection ◽

Language Generation ◽

Research Activities ◽

Information Framework

We present our current research activities associating automatic natural language processing to serious games and virtual worlds. Several interesting scenarios have been developed: language learning, natural language generation, multilingual information, emotion detection, real-time translations, and non-intrusive access to linguistic information such as definitions or synonyms. Part of our work has contributed to the specification of the Multi Lingual Information Framework [ISO FDIS 24616], (MLIF,2011). Standardization will grant stability, interoperability and sustainability of an important part of our research activities, in particular, in the framework of representing and managing multilingual textual information.

Download Full-text

Building natural language processing tools for Runyakitara

Applied Linguistics Review ◽

10.1515/applirev-2020-2004 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Fridah Katushemererwe ◽

Andrew Caines ◽

Paula Buttery

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Primary Data ◽

Computer Assisted ◽

Endangered Languages ◽

Test Case ◽

Short Supply ◽

Linguistic Resources

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.

Download Full-text

Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections

Research Ideas and Outcomes ◽

10.3897/rio.6.e55789 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 3

Author(s):

David Owen ◽

Laurence Livermore ◽

Quentin Groom ◽

Alex Hardisty ◽

Thijs Leegwater ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural History ◽

Natural Language ◽

Language Processing ◽

Scientific Workflow ◽

Entity Recognition ◽

Research Activities ◽

Handwritten Text ◽

Segmented Images

We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the pipeline's component parts based on some of the state-of-the-art technologies. Optical Character Recognition (OCR) can be used to digitise text on images of specimens. However, recognising text quickly and accurately from these images can be a challenge for OCR. We show that OCR performance can be improved by prior segmentation of specimen images into their component parts. This ensures that only text-bearing labels are submitted for OCR processing as opposed to whole specimen images, which inevitably contain non-textual information that may lead to false positive readings. In our testing Tesseract OCR version 4.0.0 offers promising text recognition accuracy with segmented images. Not all the text on specimen labels is printed. Handwritten text varies much more and does not conform to standard shapes and sizes of individual characters, which poses an additional challenge for OCR. Recently, deep learning has allowed for significant advances in this area. Google's Cloud Vision, which is based on deep learning, is trained on large-scale datasets, and is shown to be quite adept at this task. This may take us some way towards negating the need for humans to routinely transcribe handwritten text. Determining the countries and collectors of specimens has been the goal of previous automated text digitisation research activities. Our approach also focuses on these two pieces of information. An area of Natural Language Processing (NLP) known as Named Entity Recognition (NER) has matured enough to semi-automate this task. Our experiments demonstrated that existing approaches can accurately recognise location and person names within the text extracted from segmented images via Tesseract version 4.0.0. Potentially, NER could be used in conjunction with other online services, such as those of the Biodiversity Heritage Library to map the named entities to entities in the biodiversity literature (https://www.biodiversitylibrary.org/docs/api3.html). We have highlighted the main recommendations for potential pipeline components. The document also provides guidance on selecting appropriate software solutions. These include automatic language identification, terminology extraction, and integrating all pipeline components into a scientific workflow to automate the overall digitisation process.

Download Full-text

Guru

Cross-Disciplinary Advances in Applied Natural Language Processing ◽

10.4018/978-1-61350-447-5.ch011 ◽

2012 ◽

pp. 156-171 ◽

Cited By ~ 8

Author(s):

Andrew M. Olney ◽

Natalie K. Person ◽

Arthur C. Graesser

Keyword(s):

Natural Language Processing ◽

Knowledge Representation ◽

Natural Language ◽

Language Processing ◽

Natural Language Understanding ◽

Natural Language Generation ◽

Language Understanding ◽

Language Generation ◽

Processing Techniques

The authors discuss Guru, a conversational expert ITS. Guru is designed to mimic expert human tutors using advanced applied natural language processing techniques including natural language understanding, knowledge representation, and natural language generation.

Download Full-text

Natural Language Processing and Language Learning

The Encyclopedia of Applied Linguistics ◽

10.1002/9781405198431.wbeal0858 ◽

2012 ◽

Cited By ~ 5

Author(s):

Detmar Meurers

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing

Download Full-text

Towards equivalence links between senses in plWordNet and Princeton WordNet

Lodz Papers in Pragmatics ◽

10.1515/lpp-2017-0002 ◽

2017 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Ewa Rudnicka ◽

Francis Bond ◽

Łukasz Grabowski ◽

Maciej Piasecki ◽

Tadeusz Piotrowski

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Formal Semantic ◽

Parallel Corpus ◽

Bilingual Dictionaries ◽

Application Potential ◽

Princeton Wordnet

AbstractThe paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.

Download Full-text

Natural language processing techniques in computer-assisted language learning: Status and instructional issues

Instructional Science ◽

10.1007/bf00896878 ◽

1995 ◽

Vol 23 (5-6) ◽

pp. 351-380 ◽

Cited By ~ 3

Author(s):

V. Melissa Holland ◽

Jonathan D. Kaplan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Computer Assisted ◽

Computer Assisted Language Learning ◽

Processing Techniques

Download Full-text

Natural language processing tools for computer assisted language learning

Linguistik Online ◽

10.13092/lo.17.790 ◽

2003 ◽

Vol 17 (5) ◽

Author(s):

Anne Vandeventer Faltin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Computer Assisted ◽

Computer Assisted Language Learning ◽

Sentence Structure ◽

Error Diagnosis ◽

Diagnosis System ◽

Spell Checker

This paper illustrates the usefulness of natural language processing (NLP) tools for computer assisted language learning (CALL) through the presentation of three NLP tools integrated within a CALL software for French. These tools are (i) a sentence structure viewer; (ii) an error diagnosis system; and (iii) a conjugation tool. The sentence structure viewer helps language learners grasp the structure of a sentence, by providing lexical and grammatical information. This information is derived from a deep syntactic analysis. Two different outputs are presented. The error diagnosis system is composed of a spell checker, a grammar checker, and a coherence checker. The spell checker makes use of alpha-codes, phonological reinterpretation, and some ad hoc rules to provide correction proposals. The grammar checker employs constraint relaxation and phonological reinterpretation as diagnosis techniques. The coherence checker compares the underlying "semantic" structures of a stored answer and of the learners' input to detect semantic discrepancies. The conjugation tool is a resource with enhanced capabilities when put on an electronic format, enabling searches from inflected and ambiguous verb forms.

Download Full-text

Linguistic and multilingual issues in virtual worlds and serious games: a general review

Journal of Virtual Worlds Research ◽

10.4101/jvwr.v7i1.7084 ◽

2014 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samuel Cruz-Lara ◽

Alexandre Denis ◽

Nadia Bellalem

Keyword(s):

Sentiment Analysis ◽

Virtual Worlds ◽

Serious Games ◽

Digital Content ◽

Emotion Detection ◽

Research Projects ◽

Textual Information ◽

Linguistic Support ◽

General Review ◽

Research Activities

Within a globalized world, the need for linguistic support is increasing every day. Linguistic information, and in particular multilingual textual information, plays a significant role for describing digital content: information describing pictures or video sequences, general information presented to the user graphically or via a text-to-speech processor, menus in interactive multimedia or TV, subtitles, dialogue prompts, or implicit data appearing on an image such as captions, or tags. It is obviously crucial to associate digital content to multilingual textual information in a non-intrusive way: the user must decide, whether or not, he wants to display the textual information related to the digital content he is dealing with in any particular language.In this paper we will present a general review on linguistic and multilingual issues related to virtual worlds and serious games. The expression “linguistic and multilingual issues” will consider not only any kind of linguistic support (such as syntactic and semantic analysis) based on textual information, but also any kind of multilingual and monolingual topics (such as localization or automatic translation), and their association to virtual worlds and serious games. We will focus on our ongoing research activities, particularly in the framework of sentiment analysis and emotion detection. Note that we will also dedicate special attention to standardization issues because they grant interoperability, stability, and durability.The review will essentially be based on our own experience but some interesting international research projects and applications will be also mentioned, in particular, research projects and applications related to sentiment analysis and emotion detection.

Download Full-text

IDL-Expressions: A Formalism for Representing and Parsing Finite Languages in Natural Language Processing

Journal of Artificial Intelligence Research ◽

10.1613/jair.1309 ◽

2004 ◽

Vol 21 ◽

pp. 287-317 ◽

Cited By ~ 1

Author(s):

M. J. Nederhof ◽

G. Satta

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Upper Bound ◽

Time Complexity ◽

Large Set ◽

New Formalism ◽

Language Generation ◽

Parsing Algorithm ◽

Formal Properties

We propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then by filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity.

Download Full-text

Natural Language Processing in Serious Games: A state of the art.

International Journal of Serious Games ◽

10.17083/ijsg.v2i3.87 ◽

2015 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Davide Picca ◽

Dominique Jaccard ◽

Gérald Eberlé

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Serious Games ◽

State Of The Art ◽

Serious Game ◽

The Other ◽

Other Hand ◽

The One ◽

High Level

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure: first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

Download Full-text