Using Natural Language Processing to Extract Information from Unstructured code-change version control data: lessons learned

This paper describes an experience in teaching Machine Learning (ML) and Natural Language Processing (NLP) to a group of high school students over an intense one-month period. In this work, we provide an outline of an AI course curriculum we designed for high school students and then evaluate its effectiveness by analyzing student's feedback and student outcomes. After closely observing students, evaluating their responses to our surveys, and analyzing their contribution to the course project, we identified some possible impediments in teaching AI to high school students and propose some measures to avoid them. These measures include employing a combination of objectivist and constructivist pedagogies, reviewing/introducing basic programming concepts at the beginning of the course, and addressing gender discrepancies throughout the course.

Download Full-text

AI Opaqueness: What Makes AI Systems More Transparent?

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais1139 ◽

2020 ◽

Author(s):

Victoria Rubin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lessons Learned ◽

Nous Proposons ◽

Judicial Decision ◽

Langage Naturel ◽

Transparency And Accountability

Artificially Intelligent (AI) systems are pervasive, but poorly understood by their users and, at times, developers. It is often unclear how and why certain algorithms make choices, predictions, or conclusions. What does AI transparency mean? What explanations do AI system users desire? This panel discusses AI opaqueness with examples in applied context such as natural language processing, people categorization, judicial decision explanations, and system recommendations. We offer insights from interviews with AI system users about their perceptions and developers’ lessons learned. What steps should be taken towards AI transparency and accountability for its decisions? Les systèmes artificiellement intelligents (IA) sont omniprésents, mais mal compris par leurs utilisateurs et, parfois, par les développeurs. On ne sait souvent pas comment et pourquoi certains algorithmes font des choix, des prédictions ou des conclusions. Que signifie la transparence de l'IA? Quelles explications les utilisateurs du système d'IA souhaitent-ils? Ce panel examine l'opacité de l'IA avec des exemples dans un contexte appliqué tels que le traitement du langage naturel, la catégorisation des personnes, les explications des décisions judiciaires et les recommandations système. Nous proposons des informations issues d'entretiens avec des utilisateurs de systèmes d'IA sur leurs perceptions et les leçons apprises par les développeurs. Quelles mesures devraient être prises pour assurer la transparence et la responsabilité de l'IA pour ses décisions?

Download Full-text

Data Science and Natural Language Processing to Extract Information in Clinical Domain

10.1145/3493700.3493773 ◽

2022 ◽

Author(s):

V.G.Vinod Vydiswaran ◽

Xinyan Zhao ◽

Deahan Yu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Science ◽

Clinical Domain ◽

Extract Information

Download Full-text

Data Science and Natural Language Processing to Extract Information from Clinical Narratives

8th ACM IKDD CODS and 26th COMAD ◽

10.1145/3430984.3431967 ◽

2020 ◽

Author(s):

V.G.Vinod Vydiswaran ◽

Xinyan Zhao ◽

Deahan Yu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Science ◽

Extract Information

Download Full-text

Extracting Information from Archaeological Texts

Open Archaeology ◽

10.1515/opar-2015-0004 ◽

2015 ◽

Vol 1 (1) ◽

Cited By ~ 6

Author(s):

Keith W. Kintigh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automated Classification ◽

Automated Extraction ◽

Processing Technologies ◽

Automated Translation ◽

Extract Information

AbstractTo address archaeology’s most pressing substantive challenges, researchers must discover, access, and extract information contained in the reports and articles that codify so much of archaeology’s knowledge. These efforts will require application of existing and emerging natural language processing technologies to extensive digital corpora. Automated classification can enable development of metadata needed for the discovery of relevant documents. Although it is even more technically challenging, automated extraction of and reasoning with information from texts can provide urgently needed access to contextualized information within documents. Effective automated translation is needed for scholars to benefit from research published in other languages.

Download Full-text

Evaluating Natural Language Processors in the Clinical Domain

Methods of Information in Medicine ◽

10.1055/s-0038-1634566 ◽

1998 ◽

Vol 37 (04/05) ◽

pp. 334-344 ◽

Cited By ~ 34

Author(s):

G. Hripcsak ◽

C. Friedman

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Performance Measures ◽

Language Processing ◽

Evaluation Studies ◽

Free Text ◽

Number Of Factors ◽

Clinical Domain ◽

Extract Information

AbstractEvaluating natural language processing (NLP) systems in the clinical domain is a difficult task which is important for advancement of the field. A number of NLP systems have been reported that extract information from free-text clinical reports, but not many of the systems have been evaluated. Those that were evaluated noted good performance measures but the results were often weakened by ineffective evaluation methods. In this paper we describe a set of criteria aimed at improving the quality of NLP evaluation studies. We present an overview of NLP evaluations in the clinical domain and also discuss the Message Understanding Conferences (MUC) [1-41. Although these conferences constitute a series of NLP evaluation studies performed outside of the clinical domain, some of the results are relevant within medicine. In addition, we discuss a number of factors which contribute to the complexity that is inherent in the task of evaluating natural language systems.

Download Full-text

Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa261 ◽

2020 ◽

Author(s):

William Digan ◽

Aurélie Névéol ◽

Antoine Neuraz ◽

Maxime Wack ◽

David Baudoin ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Workflow Management ◽

Lessons Learned ◽

Secondary Use ◽

Research Fields ◽

Use Of Data ◽

Public Repositories ◽

Clinical Natural Language Processing

Abstract Background The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. Objective To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. Materials and Methods Based on the literature across multiple research fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. Results We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. Discussion 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. Conclusion NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.

Download Full-text