Categorising patient concerns using natural language processing techniques

ObjectivesPatient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback.MethodsPatient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation.ResultsThe LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings).DiscussionLDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action.ConclusionOur findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.

Download Full-text

Identifying Medication-related Intents from a Bidirectional Text Messaging Platform for Hypertension Management: An Unsupervised Learning Approach

10.1101/2021.12.23.21268061 ◽

2021 ◽

Author(s):

Anahita Davoudi ◽

Natalie Lee ◽

Thaibinh Luong ◽

Timothy Delaney ◽

Elizabeth Asch ◽

...

Keyword(s):

Blood Pressure ◽

Unsupervised Learning ◽

Language Processing ◽

Text Messaging ◽

Latent Dirichlet Allocation ◽

Clinical Care ◽

Hypertension Management ◽

Free Text ◽

Significant Heterogeneity ◽

Text Data

Background: Free-text communication between patients and providers is playing an increasing role in chronic disease management, through platforms varying from traditional healthcare portals to more novel mobile messaging applications. These text data are rich resources for clinical and research purposes, but their sheer volume render them difficult to manage. Even automated approaches such as natural language processing require labor-intensive manual classification for developing training datasets, which is a rate-limiting step. Automated approaches to organizing free-text data are necessary to facilitate the use of free-text communication for clinical care and research. Objective: We applied unsupervised learning approaches to 1) understand the types of topics discussed and 2) to learn medication-related intents from messages sent between patients and providers through a bi-directional text messaging system for managing participant blood pressure. Methods: This study was a secondary analysis of de-identified messages from a remote mobile text-based employee hypertension management program at an academic institution. In experiment 1, we trained a Latent Dirichlet Allocation (LDA) model for each message type (inbound-patient and outbound-provider) and identified the distribution of major topics and significant topics (probability >0.20) across message types. In experiment 2, we annotated all medication-related messages with a single medication intent. Then, we trained a second LDA model (medLDA) to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n-1-3 words) using spaCy, clinical named entities using STANZA, and medication categories using MedEx, and then applied Chi-square feature selection to learn the most informative features associated with each medication intent. Results: A total of 253 participants and 5 providers engaged in the program generating 12,131 total messages: 47% patient messages and 53% provider messages. Most patient messages correspond to blood pressure (BP) reporting, BP encouragement, and appointment scheduling. In contrast, most provider messages correspond to BP reporting, medication adherence, and confirmatory statements. In experiment 1, for both patient and provider messages, most messages contained 1 topic and few with more than 3 topics identified using LDA. However, manual review of some messages within topics revealed significant heterogeneity even within single-topic messages as identified by LDA. In experiment 2, among the 534 medication messages annotated with a single medication intent, most of the 282 patient medication messages referred to medication request (48%; n=134) and medication taking (28%; n=79); most of the 252 provider medication messages referred to medication question (69%; n=173). Although medLDA could identify a majority intent within each topic, the model could not distinguish medication intents with low prevalence within either patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusion: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitate the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated deep medication intent classification.

Download Full-text

Open-Ended Questions

Employee Surveys and Sensing ◽

10.1093/oso/9780190939717.003.0013 ◽

2020 ◽

pp. 202-218

Author(s):

Subhadra Dutta ◽

Eric M. O’Rourke

Keyword(s):

Machine Learning ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Written Language ◽

Text Data ◽

Employee Survey ◽

Trade Offs ◽

Word Relatedness ◽

Survey Responses

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.

Download Full-text

Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

PLoS ONE ◽

10.1371/journal.pone.0243208 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0243208

Author(s):

Leacky Muchene ◽

Wende Safari

Keyword(s):

Hierarchical Clustering ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Two Stage ◽

Scientific Publications ◽

Statistical Tool ◽

Second Stage ◽

The University ◽

Dirichlet Allocation

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.

Download Full-text

Trend analysis of online travel review text mining over time

Journal of Modelling in Management ◽

10.1108/jm2-10-2018-0178 ◽

2019 ◽

Vol 15 (2) ◽

pp. 491-508

Author(s):

Kaile Zhang ◽

Ichiro Koshijima

Keyword(s):

Text Mining ◽

Language Processing ◽

Semantic Analysis ◽

Direct Method ◽

Dimensional Space ◽

Single Point ◽

Text Data ◽

Content Type ◽

New Ideas ◽

Processing Techniques

Purpose The reviews of online tourism have not been taken advantage of effectively because the text data of such reviews is enormous and its current, in-depth research is still in infancy. Therefore, it is expected that the text data could be processed by the method of text mining to better understand the implicit information. The purpose of this paper is to contribute to tourism practitioners and tourists to conveniently use the texts through appropriate visualization processing techniques. In particular, time-changing reviews can be used to reflect the changes in tourists’ feedback and concerns. Design/methodology/approach Latent semantic analysis is a new branch of semantics. Every term in the document can be regarded as a single point in multi-dimensional space. When a document with semantics comes into such space, the distribution of the document is not random, but will obey some type of semantic structure. Findings First, overall grasping for the big data is applicable. Second, propose a direct method is proposed that allows more non-language processing researchers or proprietors to use the data. Lastly, the results of changes in different spans of times are investigated. Originality/value This paper proposes an approach to disclose a significant number of travel comments from different years that may generate new ideas for tourism. The authors put forward a processing approach to deal with large amounts of texts of comments. Using the case study of Mt. Lushan, the various changes of travel reviews over the years are successfully visualized and displayed.

Download Full-text

Web-Based Text Analysis of the Patient Safety Concerns of Various Healthcare Stakeholders

10.3233/shti210711 ◽

2021 ◽

Author(s):

Insook Cho ◽

Minyoung Lee ◽

Yeonjin Kim

Keyword(s):

Patient Safety ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Quality Of Healthcare ◽

Fundamental Aspect ◽

Serious Adverse Events ◽

Text Data ◽

Web Based ◽

Korean Government ◽

Set Up

Patient safety is a fundamental aspect of the quality of healthcare and there is a growing interest in improving safety among healthcare stakeholders in many countries. The Korean government recognized that patient safety is a threat to society following several serious adverse events, and so the Ministry of Health and Welfare of the Korean government set up the Patient Safety Act in January 2015. This study analyzed text data on patient safety collected from web-based, user-generated documents related to the legislation to see if they accurately represent the specific concerns of various healthcare stakeholders. We adopted the unsupervised natural language processing method of probabilistic topic modeling and also Latent Dirichlet Allocation. The results showed that text data are useful for inferring the latent concerns of healthcare consumers, providers, government bodies, and researchers as well as changes therein over time.

Download Full-text

Capturing and analyzing social representations. A first application of Natural Language Processing techniques to reader’s comments in COVID-19 news. Argentina, 2020

10.31235/osf.io/3pcdu ◽

2020 ◽

Author(s):

German Rosati ◽

Laia Domenech ◽

Adriana Chazarreta ◽

Tomas Maguire

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Social Representations ◽

Web Crawler ◽

First Approximation ◽

Processing Techniques ◽

Dirichlet Allocation

We present a first approximation to the quantification of social representations about the COVID-19, using news comments. A web crawler was developed for constructing the dataset of reader’s comments. We detect relevant topics in the dataset using Latent Dirichlet Allocation, and analyze its evolution during time. Finally, we show a first prototype to the prediction of the majority topics, using FastText.

Download Full-text

Stopwords in technical language processing

PLoS ONE ◽

10.1371/journal.pone.0254937 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0254937

Author(s):

Serhad Sarica ◽

Jianxi Luo

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Text Classification ◽

Topic Modelling ◽

Inverse Document Frequency ◽

Technical Language ◽

Statistical Measures ◽

Document Frequency ◽

Standard Component ◽

Processing Techniques

There are increasing applications of natural language processing techniques for information retrieval, indexing, topic modelling and text classification in engineering contexts. A standard component of such tasks is the removal of stopwords, which are uninformative components of the data. While researchers use readily available stopwords lists that are derived from non-technical resources, the technical jargon of engineering fields contains their own highly frequent and uninformative words and there exists no standard stopwords list for technical language processing applications. Here we address this gap by rigorously identifying generic, insignificant, uninformative stopwords in engineering texts beyond the stopwords in general texts, based on the synthesis of alternative statistical measures such as term frequency, inverse document frequency, and entropy, and curating a stopwords dataset ready for technical language processing applications.

Download Full-text

Exploring the Non-Medical impacts of Covid-19 using Natural Language Processing

10.20944/preprints202011.0056.v1 ◽

2020 ◽

Author(s):

Amol Agade ◽

Samta Balpande

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Global Economy ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Financial Industry ◽

Distance Map ◽

The Media ◽

Non Negative Matrix Factorization

Ongoing COVID-19 Pandemic has resulted into massive damage to various platforms of global economy which has caused disruption to human livelihood. Natural Language Processing has been extensively used in different organizations to categorize sentiments, perform recommendation, summarizing information and topic modelling. This research aims to understand the non-medical impact of COVID-19 on global economy by leveraging the natural language processing methodology. This methodology comprises of text classification which includes topic modelling on unstructured COVID-19 media articles dataset provided by Anacode. Like other Natural Language Processing algorithms, Latent Dirichlet allocation (LDA) and Non-negative matrix factorization (NMF) has been proposed to classify the media articles dataset in order to analyze COVID-19 pandemic impacts in the different sectors of global economy. Model Accuracy was examined based on the coherence and perplexity score which came out to be 0.51 and -10.90 using LDA algorithm. Both the LDA and NMF algorithm identified similar prevalent topics that was impacted by COVID-19 pandemic in multiple sectors of economy. Through intertopic distance map visualization produced by LDA algorithm, it can be reciprocated that general industries which includes children schooling, parental care, and family gatherings had the major impact followed by business sector and the financial industry.

Download Full-text

DERIVATION OF DESCRIPTION FEATURES FOR ENGINEERING CHANGE REQUEST BY AID OF LATENT DIRICHLET ALLOCATION

Proceedings of the Design Society: DESIGN Conference ◽

10.1017/dsd.2020.98 ◽

2020 ◽

Vol 1 ◽

pp. 697-706

Author(s):

M. Riesener ◽

C. Dölle ◽

M. Mendl-Heinisch ◽

G. Schuh ◽

A. Keuper

Keyword(s):

Language Processing ◽

Latent Dirichlet Allocation ◽

Data Driven ◽

Automated Classification ◽

Engineering Change ◽

Complex Products ◽

Data Driven Approach ◽

Engineering Changes ◽

Processing Techniques ◽

Change Requests

AbstractComplex products and shorter development cycles lead to an increasing number of engineering changes. In order to be able to process these changes more effectively and efficiently, this paper develops a description model as a first step towards a data driven approach of processing engineering change requests. The description model is systematically derived from literature using text mining and natural language processing techniques. An example of the application is given by an automated classification based on similarity calculations between new and historic engineering change requests.

Download Full-text

GOOD AND BAD SOCIOLOGY: DOES TOPIC MODELLING MAKE A DIFFERENCE?

Society Register ◽

10.14746/sr.2021.5.4.01 ◽

2021 ◽

Vol 5 (4) ◽

pp. 7-22

Author(s):

MARIUSZ BARANOWSKI ◽

PIOTR CICHOCKI

Keyword(s):

Research Methods ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Social Reality ◽

Small Data ◽

Sociological Research ◽

Topic Modelling ◽

Social Sciences And Humanities ◽

Novel Approach ◽

New Research

The changing social reality, which is increasingly digitally networked, requires new research methods capable of analysing large bodies of data (including textual data). This development poses a challenge for sociology, whose ambition is primarily to describe and explain social reality. As traditional sociological research methods focus on analysing relatively small data, the existential challenge of today involves the need to embrace new methods and techniques, which enable valuable insights into big volumes of data at speed. One such emerging area of investigation involves the application of Natural Language Processing and Machine-Learning to text mining, which allows for swift analyses of vast bodies of textual content. The paper’s main aim is to probe whether such a novel approach, namely, topic modelling based on Latent Dirichlet Allocation (LDA) algorithm, can find meaningful applications within sociology and whether its adaptation makes sociology perform its tasks better. In order to outline the context of the applicability of LDA in the social sciences and humanities, an analysis of abstracts of articles published in journals indexed in Elsevier’s Scopus database on topic modelling was conducted. This study, based on 1,149 abstracts, showed not only the diversity of topics undertaken by researchers but helped to answer the question of whether sociology using topic modelling is “good” sociology in the sense that it provides opportunities for exploration of topic areas and data that would not otherwise be undertaken.

Download Full-text