Natural Language Processing: History, Evolution, Application, and Future Work

Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices

Natural Language Engineering ◽

10.1017/s1351324918000074 ◽

2018 ◽

Vol 24 (3) ◽

pp. 393-413 ◽

Cited By ~ 2

Author(s):

STELLA FRANK ◽

DESMOND ELLIOTT ◽

LUCIA SPECIA

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Empirical Evidence ◽

Language Processing ◽

Native Speaker ◽

Native Language ◽

Target Language ◽

Image Description ◽

The Core ◽

Future Work

AbstractTwo studies on multilingual multimodal image description provide empirical evidence towards two questions at the core of the task: (i) whether target language speakers prefer descriptions generated directly in their native language, as compared to descriptions translated from a different language; (ii) whether images improve human translation of descriptions. These results provide guidance for future work in multimodal natural language processing by first showing that on the whole, translations are not distinguished from native language descriptions, and second delineating and quantifying the information gained from the image during the human translation task.

Download Full-text

Event Extraction and Representation: A Case Study for the Portuguese Language

Information ◽

10.3390/info10060205 ◽

2019 ◽

Vol 10 (6) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

Paulo Quaresma ◽

Vítor Beires Nogueira ◽

Kashyap Raiyani ◽

Roy Bayot

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Event Extraction ◽

Semantic Role Labeling ◽

Knowledge Based ◽

Part Of Speech ◽

Relevant Role ◽

Text Information ◽

Future Work

Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.

Download Full-text

Development of a Custom Spell-Checker for Emergency Department Data

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v11i1.9745 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Sophie Rand ◽

Ramona Lall ◽

Ramona Lall

Keyword(s):

Emergency Department ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Syndromic Surveillance ◽

Word List ◽

Chief Complaint ◽

Free Text ◽

Spell Checker ◽

Future Work

ObjectiveTo share progress on a custom spell-checker for emergency department chief complaint free-text data and demonstrate a spell-checker validation Shiny application.IntroductionEmergency department (ED) syndromic surveillance relies on a chief complaint, which is often a free-text field, and may contain misspelled words, syntactic errors, and healthcare-specific and/or facility-specific abbreviations. Cleaning of the chief complaint field may improve syndrome capture sensitivity and reduce misclassification of syndromes. We are building a spell-checker, customized with language found in ED corpora, as our first step in cleaning our chief complaint field. This exercise would elucidate the value of pre-processing text and would lend itself to future work using natural language processing (NLP) techniques, such as topic modeling. Such a tool could be extensible to other datasets that contain free-text fields, including electronic reportable disease lab and case reporting.MethodsChief complaints may contain words that are incorrect if they are misspelled (e.g.,“patient has herpertension”), or, if the word yields a syntactically incorrect phrase (e.g., the word “huts” in the phrase: “my toe huts”).We are developing a spell-checker tool for chief complaint text using the R and Python programming languages. The first stage in the development of the spell-checker is the identifying and handling of misspellings; future work will address syntactic errors. Known abbreviations are identified using regular expressions, and unknown abbreviations are addressed by the spell-checker. The spell checker performs 4 steps on chief complaint data: identification of misspellings, generation of a substitute candidate word list, word sense disambiguation to identify replacement word, and replacement of the misspelled word, based on methods found in the literature.[1] As the spell-checker requires a dictionary of correctly spelled, healthcare-specific terms including all terms that would appear in an ED corpus, we used vocabularies from the Unified Medical Language System, ED-specific terminology, and domain expert user input. Dictionary construction, misspelling identification algorithms, and word list generation algorithms are in the development stage.Simultaneously, we are building an R Shiny interactive web application for syndromic surveillance analysts to manually correct a subset of misspelled words, which we will use to validate and evaluate the performance of the spell-checker tool.[1] Tolentino HD, Matters MD, Walop W, et al. A UMLS-based spell checker for natural language processing in vaccine safety. BMC Medical Informatics and Decision Making. 2007;7(1). doi:10.1186/1472-6947-7-3.ResultsProject still in development phase.ConclusionsThe audience will learn about important considerations for developing a spell-checker, including those for data structure of a dictionary and algorithms for identification of misplaced words and identification of candidate replacement words. We will demonstrate our word list generation algorithm and the Shiny application which uses these words for spell-checker validation. We will share relevant code; after our presentation, audience members should able to apply code and lessons to their own projects and/or to collaborate with the NYC Department of Health and Mental Hygiene.

Download Full-text

Natural Language Processing and Enhanced Clinical Decision Making Radiology and VINCI

PsycEXTRA Dataset ◽

10.1037/e615572012-015 ◽

2012 ◽

Author(s):

Eliot Siegel

Keyword(s):

Decision Making ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Decision Making ◽

Clinical Decision

Download Full-text

Natural Language Processing in the Clinical Setting

PsycEXTRA Dataset ◽

10.1037/e615572012-013 ◽

2012 ◽

Author(s):

Thomas H. Payne

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Setting

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.130.83 ◽

2010 ◽

Vol 130 (1) ◽

pp. 83-91 ◽

Cited By ~ 1

Author(s):

Tomoya Iwakura ◽

Seishi Okamoto ◽

Kazuo Asakawa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Training Data ◽

Weak Learner

Download Full-text

1243-P: Novel Use of Natural Language Processing to Identify Reasons for Insulin Discontinuation in Patients with T2DM: A Real-World Evidence Study

Diabetes ◽

10.2337/db19-1243-p ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 1243-P

Author(s):

JIANMIN WU ◽

FRITHA J. MORRISON ◽

ZHENXIANG ZHAO ◽

XUANYAO HE ◽

MARIA SHUBINA ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Real World ◽

Real World Evidence

Download Full-text

WHO DO WE THINK WE ARE? COMPARING INTERSECTIONAL IDENTITY TRENDS IN ASEE AND CEEA-ACEG USING NATURAL LANGUAGE PROCESSING AND REVIEW OF PROCEEDINGS

Proceedings of the Canadian Engineering Education Association (CEEA) ◽

10.24908/pceea.vi0.13830 ◽

2019 ◽

Author(s):

Pamela Rogalski ◽

Eric Mikulin ◽

Deborah Tihanyi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Activity Theory ◽

Language Processing ◽

Division Of Labour ◽

Cultural Historical Activity Theory ◽

Original Question ◽

Micro Level ◽

Historical Activity ◽

Cultural Historical Activity

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

Download Full-text