Identifying Medication-related Intents from a Bidirectional Text Messaging Platform for Hypertension Management: An Unsupervised Learning Approach

Background: Free-text communication between patients and providers is playing an increasing role in chronic disease management, through platforms varying from traditional healthcare portals to more novel mobile messaging applications. These text data are rich resources for clinical and research purposes, but their sheer volume render them difficult to manage. Even automated approaches such as natural language processing require labor-intensive manual classification for developing training datasets, which is a rate-limiting step. Automated approaches to organizing free-text data are necessary to facilitate the use of free-text communication for clinical care and research. Objective: We applied unsupervised learning approaches to 1) understand the types of topics discussed and 2) to learn medication-related intents from messages sent between patients and providers through a bi-directional text messaging system for managing participant blood pressure. Methods: This study was a secondary analysis of de-identified messages from a remote mobile text-based employee hypertension management program at an academic institution. In experiment 1, we trained a Latent Dirichlet Allocation (LDA) model for each message type (inbound-patient and outbound-provider) and identified the distribution of major topics and significant topics (probability >0.20) across message types. In experiment 2, we annotated all medication-related messages with a single medication intent. Then, we trained a second LDA model (medLDA) to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n-1-3 words) using spaCy, clinical named entities using STANZA, and medication categories using MedEx, and then applied Chi-square feature selection to learn the most informative features associated with each medication intent. Results: A total of 253 participants and 5 providers engaged in the program generating 12,131 total messages: 47% patient messages and 53% provider messages. Most patient messages correspond to blood pressure (BP) reporting, BP encouragement, and appointment scheduling. In contrast, most provider messages correspond to BP reporting, medication adherence, and confirmatory statements. In experiment 1, for both patient and provider messages, most messages contained 1 topic and few with more than 3 topics identified using LDA. However, manual review of some messages within topics revealed significant heterogeneity even within single-topic messages as identified by LDA. In experiment 2, among the 534 medication messages annotated with a single medication intent, most of the 282 patient medication messages referred to medication request (48%; n=134) and medication taking (28%; n=79); most of the 252 provider medication messages referred to medication question (69%; n=173). Although medLDA could identify a majority intent within each topic, the model could not distinguish medication intents with low prevalence within either patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusion: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitate the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated deep medication intent classification.

Download Full-text

An exploration of text mining of narrative reports of injury incidents to assess risk

MATEC Web of Conferences ◽

10.1051/matecconf/201825106020 ◽

2018 ◽

Vol 251 ◽

pp. 06020 ◽

Cited By ~ 4

Author(s):

David Passmore ◽

Chungil Chae ◽

Yulia Kustikova ◽

Rose Baker ◽

Jeong-Ha Yim

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Surface Mining ◽

Modeling Processes ◽

Free Text ◽

Text Data ◽

Injury Occurrence ◽

The Usa ◽

Musculoskeletal Systems ◽

Topic Mining

A topic model was explored using unsupervised machine learning to summarized free-text narrative reports of 77,215 injuries that occurred in coal mines in the USA between 2000 and 2015. Latent Dirichlet Allocation modeling processes identified six topics from the free-text data. One topic, a theme describing primarily injury incidents resulting in strains and sprains of musculoskeletal systems, revealed differences in topic emphasis by the location of the mine property at which injuries occurred, the degree of injury, and the year of injury occurrence. Text narratives clustered around this topic refer most frequently to surface or other locations rather than underground locations that resulted in disability and that, also, increased secularly over time. The modeling success enjoyed in this exploratory effort suggests that additional topic mining of these injury text narratives is justified, especially using a broad set of covariates to explain variations in topic emphasis and for comparison of surface mining injuries with injuries occurring during site preparation for construction.

Download Full-text

Open-Ended Questions

Employee Surveys and Sensing ◽

10.1093/oso/9780190939717.003.0013 ◽

2020 ◽

pp. 202-218

Author(s):

Subhadra Dutta ◽

Eric M. O’Rourke

Keyword(s):

Machine Learning ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Written Language ◽

Text Data ◽

Employee Survey ◽

Trade Offs ◽

Word Relatedness ◽

Survey Responses

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.

Download Full-text

Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study (Preprint)

10.2196/preprints.12575 ◽

2018 ◽

Author(s):

Jeremy Petch ◽

Jane Batt ◽

Joshua Murray ◽

Muhammad Mamdani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Clinical Features ◽

Language Processing ◽

A Priori ◽

Cross Sectional Study ◽

Free Text ◽

Cross Sectional ◽

Text Data ◽

Complex Features

BACKGROUND The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data. OBJECTIVE This study aimed to assess the performance of a commercially available NLP tool for extracting clinical features from free-text consult notes. METHODS We conducted a pilot, retrospective, cross-sectional study of the accuracy of NLP from dictated consult notes from our tuberculosis clinic with manual chart abstraction as the reference standard. Consult notes for 130 patients were extracted and processed using NLP. We extracted 15 clinical features from these consult notes and grouped them a priori into categories of simple, moderate, and complex for analysis. RESULTS For the primary outcome of overall accuracy, NLP performed best for features classified as simple, achieving an overall accuracy of 96% (95% CI 94.3-97.6). Performance was slightly lower for features of moderate clinical and linguistic complexity at 93% (95% CI 91.1-94.4), and lowest for complex features at 91% (95% CI 87.3-93.1). CONCLUSIONS The findings of this study support the use of NLP for extracting clinical features from dictated consult notes in the setting of a tuberculosis clinic. Further research is needed to fully establish the validity of NLP for this and other purposes.

Download Full-text

Discovering Monogenic Patients with a Confirmed Molecular Diagnosis in Millions of Clinical Notes with MonoMiner

10.1101/2021.07.05.21259995 ◽

2021 ◽

Author(s):

David Wei Wu ◽

Jon A Bernstein ◽

Gill Bejerano

Keyword(s):

Molecular Diagnosis ◽

Language Processing ◽

Clinical Care ◽

Disease Diagnosis ◽

Monogenic Disease ◽

Free Text ◽

Causal Gene ◽

Hospital System ◽

Clinical Notes ◽

Monogenic Diseases

Purpose: Cohort building is a powerful foundation for improving clinical care, performing research, clinical trial recruitment, and many other applications. We set out to build a cohort of all patients with monogenic conditions who have received a definitive causal gene diagnosis in a 3 million patient hospital system. Methods: We define a subset of half (4,461) of OMIM curated diseases for which at least one monogenic causal gene is definitively known. We then introduce MonoMiner, a natural language processing framework to identify molecularly confirmed monogenic patients from free-text clinical notes. Results: We show that ICD-10-CM codes cover only a fraction of known monogenic diseases, and even where available, code-based patient retrieval offers 0.12 precision. Searching by causal gene symbol offers great recall but an even worse 0.09 precision. MonoMiner achieves 7-9 times higher precision (0.82), with 0.88 precision on disease diagnosis alone, tagging 4,259 patients with 560 monogenic diseases and 534 causal genes, at 0.48 recall. Conclusion: MonoMiner enables the discovery of a large, high-precision cohort of monogenic disease patients with an established molecular diagnosis, empowering numerous downstream uses. Because it relies only on clinical notes, MonoMiner is highly portable, and its approach is adaptable to other domains and languages.

Download Full-text

Analysis of the Implementation, User Perspectives, and Feedback From a Mobile Health Intervention for Individuals Living With Hypertension (DREAM-GLOBAL): Mixed Methods Study

JMIR mhealth and uhealth ◽

10.2196/12639 ◽

2019 ◽

Vol 7 (12) ◽

pp. e12639 ◽

Cited By ~ 4

Author(s):

Jordan Barsky ◽

Rebekah Hunter ◽

Colin McAllister ◽

Karen Yeates ◽

Norm Campbell ◽

...

Keyword(s):

Blood Pressure ◽

Health Care ◽

Health Care Providers ◽

Text Messaging ◽

Quantitative Data ◽

Qualitative Data ◽

Community Leadership ◽

Text Messages ◽

Hypertension Management ◽

Care Providers

Background DREAM-GLOBAL (Diagnosing hypertension—Engaging Action and Management in Getting Lower Blood Pressure in Indigenous and low- and middle-income countries) studied a SMS text messaging–based system for blood pressure measurement and hypertension management in Canadian Aboriginal and Tanzanian communities. The use of SMS text messages is an emerging point of interest in global health care initiatives because of their scalability, customizability, transferability, and cost-effectiveness. Objective The study aim was to assess the effect on the difference in blood pressure reduction of active hypertension management messages or passive health behavior messages. The system was designed to be implemented in remote areas with wireless availability. This study described the implementation and evaluation of technical components, including quantitative data from the transmission of blood pressure measurements and qualitative data collected on the operational aspects of the system from participants, health care providers, and community leadership. Methods The study was implemented in six remote Indigenous Canadian and two rural Tanzanian communities. Blood pressure readings were taken by a community health worker and transmitted to a mobile phone via Bluetooth, then by wireless to a programmed central server. From the server, the readings were sent to the participant’s own phone as well. Participants also received biweekly tailored SMS text messages on their phones. Quantitative data on blood pressure reading transmissions were collected from the study central server. Qualitative data were collected by surveys, focus groups, and key informant interviews of participants, health care providers, and health leadership. Results In Canada, between February 2014 and February 2017, 2818 blood pressure readings from 243 patients were transmitted to the central server. In Tanzania, between October 2014 and August 2015, 1165 readings from 130 patients were transmitted to the central server. The use of Bluetooth technology enabled the secure, reliable transmission of information from participants to their health care provider. The timing and frequency were satisfactory to 137 of 187 (73.2%) of participants, supporting the process of sending weekly messages twice on Mondays and Thursdays at 11 am. A total of 97.0% (164/169) of the participants surveyed said they would recommend participation in the DREAM-GLOBAL program to a friend or relative with hypertension. Conclusions In remote communities, the DREAM-GLOBAL study helped local health care providers deliver a blood pressure management program that enabled patients and community workers to feel connected. The technical components of the study were implemented as planned, and patients felt supported in their management through the SMS text messaging and mobile health program. Technological issues were solved with troubleshooting. Overall, the technical aspects of this research program enhanced clinical care and study evaluation and were well received by participants, health care workers, and community leadership. Trial Registration Clinicaltrials.gov NCT02111226; https://clinicaltrials.gov/ct2/show/NCT02111226.

Download Full-text

Web-Based Text Analysis of the Patient Safety Concerns of Various Healthcare Stakeholders

10.3233/shti210711 ◽

2021 ◽

Author(s):

Insook Cho ◽

Minyoung Lee ◽

Yeonjin Kim

Keyword(s):

Patient Safety ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Quality Of Healthcare ◽

Fundamental Aspect ◽

Serious Adverse Events ◽

Text Data ◽

Web Based ◽

Korean Government ◽

Set Up

Patient safety is a fundamental aspect of the quality of healthcare and there is a growing interest in improving safety among healthcare stakeholders in many countries. The Korean government recognized that patient safety is a threat to society following several serious adverse events, and so the Ministry of Health and Welfare of the Korean government set up the Patient Safety Act in January 2015. This study analyzed text data on patient safety collected from web-based, user-generated documents related to the legislation to see if they accurately represent the specific concerns of various healthcare stakeholders. We adopted the unsupervised natural language processing method of probabilistic topic modeling and also Latent Dirichlet Allocation. The results showed that text data are useful for inferring the latent concerns of healthcare consumers, providers, government bodies, and researchers as well as changes therein over time.

Download Full-text

Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

Bioinformatics ◽

10.1093/bioinformatics/btaa668 ◽

2020 ◽

Author(s):

Keno K Bressem ◽

Lisa C Adams ◽

Robert A Gaudin ◽

Daniel Tröltzsch ◽

Bernd Hamm ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Language Model ◽

Fine Tuning ◽

Supplementary Information ◽

Free Text ◽

Clinical Workflow ◽

Text Data ◽

Unlabelled Data ◽

Medical Reports

Abstract Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Categorising patient concerns using natural language processing techniques

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100274 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100274

Author(s):

Paul Fairie ◽

Zilong Zhang ◽

Adam G D'Souza ◽

Tara Walsh ◽

Hude Quan ◽

...

Keyword(s):

Language Processing ◽

Latent Dirichlet Allocation ◽

Community Care ◽

Topic Modelling ◽

Text Data ◽

Emergency Department Care ◽

Patient Feedback ◽

Care Facilities ◽

Processing Techniques ◽

Patient Concerns

ObjectivesPatient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback.MethodsPatient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation.ResultsThe LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings).DiscussionLDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action.ConclusionOur findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.

Download Full-text

Developing and testing an automated qualitative assistant (AQUA) to support qualitative analysis

Family Medicine and Community Health ◽

10.1136/fmch-2021-001287 ◽

2021 ◽

Vol 9 (Suppl 1) ◽

pp. e001287

Author(s):

Robert P Lennon ◽

Robbie Fraleigh ◽

Lauren J Van Scoy ◽

Aparna Keshaviah ◽

Xindi C Hu ◽

...

Keyword(s):

Latent Dirichlet Allocation ◽

Analytical Techniques ◽

Latent Semantic Indexing ◽

Free Text ◽

Text Data ◽

Graph Theoretic ◽

Data Coding ◽

Survey Responses ◽

Hierarchical Representations ◽

Entire Dataset

Qualitative research remains underused, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens, and has been used in exploratory studies to reduce the burden of coding. However, methods to date use AI analytical techniques that lack transparency, potentially limiting acceptance of results. We developed an automated qualitative assistant (AQUA) using a semiclassical approach, replacing Latent Semantic Indexing/Latent Dirichlet Allocation with a more transparent graph-theoretic topic extraction and clustering method. Applied to a large dataset of free-text survey responses, AQUA generated unsupervised topic categories and circle hierarchical representations of free-text responses, enabling rapid interpretation of data. When tasked with coding a subset of free-text data into user-defined qualitative categories, AQUA demonstrated intercoder reliability in several multicategory combinations with a Cohen’s kappa comparable to human coders (0.62–0.72), enabling researchers to automate coding on those categories for the entire dataset. The aim of this manuscript is to describe pertinent components of best practices of AI/machine learning (ML)-assisted qualitative methods, illustrating how primary care researchers may use AQUA to rapidly and accurately code large text datasets. The contribution of this article is providing guidance that should increase AI/ML transparency and reproducibility.

Download Full-text