Natural Language Processing Enabling COVID-19 Predictive Analytics to Support Data-Driven Patient Advising and Pooled Testing

Abstract Objective The COVID-19 pandemic response at MUSC included virtual care visits for patients with suspected SARS-CoV-2 infection. The telehealth system used for these visits only exports a text note to integrate with the EHR, but structured and coded information about COVID-19 (e.g., exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing. Methods To capture COVID-19 information from multiple sources, a new data mart and a new Natural Language Processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information. Results The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81-92% and to enable pooled testing with a negative predictive value of 90-91% reducing the required tests to about 63%. Conclusion SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

A deep database of medical abbreviations and acronyms for natural language processing

Scientific Data ◽

10.1038/s41597-021-00929-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Lisa Grossman Liu ◽

Raymond H. Grossman ◽

Elliot G. Mitchell ◽

Chunhua Weng ◽

Karthik Natarajan ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

American English ◽

Substantial Improvement ◽

Future Application ◽

Multiple Sources ◽

High Coverage ◽

Clinical Text ◽

Automated Quality Control

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.

Download Full-text

Systematic review of current natural language processing methods and applications in cardiology

Heart ◽

10.1136/heartjnl-2021-319769 ◽

2021 ◽

pp. heartjnl-2021-319769

Author(s):

Meghan Reading Turchioe ◽

Alexander Volodarskiy ◽

Jyotishman Pathak ◽

Drew N Wright ◽

James Enlou Tcheng ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Care ◽

Real World Data ◽

Clinical Text ◽

Clinical Notes ◽

Artery Disease ◽

Automated Methods

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.

Download Full-text

Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives

Aerospace ◽

10.3390/aerospace7100143 ◽

2020 ◽

Vol 7 (10) ◽

pp. 143

Author(s):

Rodrigo L. Rose ◽

Tejas G. Puranik ◽

Dimitri N. Mavris

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Safety Data ◽

Aviation Safety ◽

Data Driven ◽

Flight Safety ◽

Commercial Aviation ◽

Unique Challenge ◽

Aviation Safety Reporting System

The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for the analysis of aviation safety narratives based on text-based accounts of in-flight events and categorical metadata parameters which accompany them. An extensive pre-processing routine is presented, including a comparison between numeric models of textual representation for the purposes of document classification. A framework for categorizing and visualizing narratives is presented through a combination of k-means clustering and 2-D mapping with t-Distributed Stochastic Neighbor Embedding (t-SNE). A cluster post-processing routine is developed for identifying driving factors in each cluster and building a hierarchical structure of cluster and sub-cluster labels. The Aviation Safety Reporting System (ASRS), which includes over a million de-identified voluntarily submitted reports describing aviation safety incidents for commercial flights, is analyzed as a case study for the methodology. The method results in the identification of 10 major clusters and a total of 31 sub-clusters. The identified groupings are post-processed through metadata-based statistical analysis of the learned clusters. The developed method shows promise in uncovering trends from clusters that are not evident in existing anomaly labels in the data and offers a new tool for obtaining insights from text-based safety data that complement existing approaches.

Download Full-text

Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods

Computational Linguistics ◽

10.1162/coli_a_00002 ◽

2010 ◽

Vol 36 (3) ◽

pp. 341-387 ◽

Cited By ~ 65

Author(s):

Nitin Madnani ◽

Bonnie J. Dorr

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Recent Work ◽

Language Processing ◽

Data Driven ◽

Future Trends ◽

Automatic Construction ◽

Work Done ◽

Paraphrase Generation ◽

Potential Use

The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language—words, phrases, and sentences—is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation.

Download Full-text

Data-driven materials research enabled by natural language processing and information extraction

Applied Physics Reviews ◽

10.1063/5.0021106 ◽

2020 ◽

Vol 7 (4) ◽

pp. 041317

Author(s):

Elsa A. Olivetti ◽

Jacqueline M. Cole ◽

Edward Kim ◽

Olga Kononova ◽

Gerbrand Ceder ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Data Driven ◽

Materials Research

Download Full-text

Health Natural Language Processing: Methodology Development and Applications (Preprint)

10.2196/preprints.23898 ◽

2020 ◽

Author(s):

Tianyong Hao ◽

Zhengxing Huang ◽

Likeng Liang ◽

Heng Weng ◽

Buzhou Tang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Technologies ◽

Medical Decision ◽

Eligibility Criteria ◽

Text Data ◽

Methodology Development ◽

Domain Experts ◽

Discharge Summaries

UNSTRUCTURED With the rapid growth of information technology, the necessity for processing massive amounts of health and medical data utilizing advanced information technologies has also grown. A large amount of valuable data exists in natural text such as free diagnosis text, discharge summaries, online health discussions, eligibility criteria of clinical trials, and so on. Health natural language processing automatically analyzes the commonalities and differences of large amounts of text data and recommend appropriate actions on behalf of domain experts to assist medical decision making. This editorial shares the methodology innovation of health natural language processing and its applications in medial domain.

Download Full-text

NLPReViz: an interactive tool for natural language processing on clinical text

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx070 ◽

2017 ◽

Vol 25 (1) ◽

pp. 81-87 ◽

Cited By ~ 11

Author(s):

Gaurav Trivedi ◽

Phuong Pham ◽

Wendy W Chapman ◽

Rebecca Hwa ◽

Janyce Wiebe ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

User Study ◽

Clinical Text ◽

Domain Experts ◽

Expert Review ◽

System Usability Scale ◽

Average System ◽

Colonoscopy Quality

Abstract The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the “appendiceal-orifice” variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for “biopsy” ranged between 0.88 and 0.94 (−1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.

Download Full-text

MalDy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports

Digital Investigation ◽

10.1016/j.diin.2019.01.017 ◽

2019 ◽

Vol 28 ◽

pp. S77-S87 ◽

Cited By ~ 8

Author(s):

ElMouatez Billah Karbab ◽

Mourad Debbabi

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Malware Detection ◽

Data Driven ◽

Machine Learning Techniques ◽

Behavioral Analysis ◽

Learning Techniques

Download Full-text

Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest

Yearbook of Medical Informatics ◽

10.15265/iy-2016-049 ◽

2016 ◽

Vol 25 (01) ◽

pp. 234-239 ◽

Cited By ~ 8

Author(s):

P. Zweigenbaum ◽

A. Névéol ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Bibliographic Databases ◽

Sources Of Information ◽

Multiple Sources ◽

Clinical Interest ◽

Clinical Natural Language Processing ◽

Selection Of

Summary Objective: To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). Method: A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. Results: The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. Conclusions: The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.

Download Full-text