Natural language processing for web browsing analytics: Challenges, lessons learned, and opportunities

This paper describes an experience in teaching Machine Learning (ML) and Natural Language Processing (NLP) to a group of high school students over an intense one-month period. In this work, we provide an outline of an AI course curriculum we designed for high school students and then evaluate its effectiveness by analyzing student's feedback and student outcomes. After closely observing students, evaluating their responses to our surveys, and analyzing their contribution to the course project, we identified some possible impediments in teaching AI to high school students and propose some measures to avoid them. These measures include employing a combination of objectivist and constructivist pedagogies, reviewing/introducing basic programming concepts at the beginning of the course, and addressing gender discrepancies throughout the course.

Download Full-text

AI Opaqueness: What Makes AI Systems More Transparent?

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais1139 ◽

2020 ◽

Author(s):

Victoria Rubin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lessons Learned ◽

Nous Proposons ◽

Judicial Decision ◽

Langage Naturel ◽

Transparency And Accountability

Artificially Intelligent (AI) systems are pervasive, but poorly understood by their users and, at times, developers. It is often unclear how and why certain algorithms make choices, predictions, or conclusions. What does AI transparency mean? What explanations do AI system users desire? This panel discusses AI opaqueness with examples in applied context such as natural language processing, people categorization, judicial decision explanations, and system recommendations. We offer insights from interviews with AI system users about their perceptions and developers’ lessons learned. What steps should be taken towards AI transparency and accountability for its decisions? Les systèmes artificiellement intelligents (IA) sont omniprésents, mais mal compris par leurs utilisateurs et, parfois, par les développeurs. On ne sait souvent pas comment et pourquoi certains algorithmes font des choix, des prédictions ou des conclusions. Que signifie la transparence de l'IA? Quelles explications les utilisateurs du système d'IA souhaitent-ils? Ce panel examine l'opacité de l'IA avec des exemples dans un contexte appliqué tels que le traitement du langage naturel, la catégorisation des personnes, les explications des décisions judiciaires et les recommandations système. Nous proposons des informations issues d'entretiens avec des utilisateurs de systèmes d'IA sur leurs perceptions et les leçons apprises par les développeurs. Quelles mesures devraient être prises pour assurer la transparence et la responsabilité de l'IA pour ses décisions?

Download Full-text

Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa261 ◽

2020 ◽

Author(s):

William Digan ◽

Aurélie Névéol ◽

Antoine Neuraz ◽

Maxime Wack ◽

David Baudoin ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Workflow Management ◽

Lessons Learned ◽

Secondary Use ◽

Research Fields ◽

Use Of Data ◽

Public Repositories ◽

Clinical Natural Language Processing

Abstract Background The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. Objective To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. Materials and Methods Based on the literature across multiple research fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. Results We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. Discussion 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. Conclusion NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.

Download Full-text

Using Natural Language Processing to Extract Information from Unstructured code-change version control data: lessons learned

10.22323/1.378.0025 ◽

2021 ◽

Author(s):

Elisabetta Ronchieri ◽

Marco Canaparo ◽

Yue Yang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lessons Learned ◽

Version Control ◽

Control Data ◽

Code Change ◽

Extract Information

Download Full-text

Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx039 ◽

2017 ◽

Vol 24 (5) ◽

pp. 986-991 ◽

Cited By ~ 27

Author(s):

David S Carrell ◽

Robert E Schoen ◽

Daniel A Leffler ◽

Michele Morris ◽

Sherri Rose ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Language Processing ◽

Lessons Learned ◽

Clinical Settings ◽

Health Record ◽

Widespread Application ◽

Electronic Health ◽

Clinical Natural Language Processing

Abstract Objective: Widespread application of clinical natural language processing (NLP) systems requires taking existing NLP systems and adapting them to diverse and heterogeneous settings. We describe the challenges faced and lessons learned in adapting an existing NLP system for measuring colonoscopy quality. Materials and Methods: Colonoscopy and pathology reports from 4 settings during 2013–2015, varying by geographic location, practice type, compensation structure, and electronic health record. Results: Though successful, adaptation required considerably more time and effort than anticipated. Typical NLP challenges in assembling corpora, diverse report structures, and idiosyncratic linguistic content were greatly magnified. Discussion: Strategies for addressing adaptation challenges include assessing site-specific diversity, setting realistic timelines, leveraging local electronic health record expertise, and undertaking extensive iterative development. More research is needed on how to make it easier to adapt NLP systems to new clinical settings. Conclusions: A key challenge in widespread application of NLP is adapting existing systems to new clinical settings.

Download Full-text

Natural Language Processing and Enhanced Clinical Decision Making Radiology and VINCI

PsycEXTRA Dataset ◽

10.1037/e615572012-015 ◽

2012 ◽

Author(s):

Eliot Siegel

Keyword(s):

Decision Making ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Decision Making ◽

Clinical Decision

Download Full-text

Natural Language Processing in the Clinical Setting

PsycEXTRA Dataset ◽

10.1037/e615572012-013 ◽

2012 ◽

Author(s):

Thomas H. Payne

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Setting

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text