Developing and testing an automated qualitative assistant (AQUA) to support qualitative analysis

Qualitative research remains underused, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens, and has been used in exploratory studies to reduce the burden of coding. However, methods to date use AI analytical techniques that lack transparency, potentially limiting acceptance of results. We developed an automated qualitative assistant (AQUA) using a semiclassical approach, replacing Latent Semantic Indexing/Latent Dirichlet Allocation with a more transparent graph-theoretic topic extraction and clustering method. Applied to a large dataset of free-text survey responses, AQUA generated unsupervised topic categories and circle hierarchical representations of free-text responses, enabling rapid interpretation of data. When tasked with coding a subset of free-text data into user-defined qualitative categories, AQUA demonstrated intercoder reliability in several multicategory combinations with a Cohen’s kappa comparable to human coders (0.62–0.72), enabling researchers to automate coding on those categories for the entire dataset. The aim of this manuscript is to describe pertinent components of best practices of AI/machine learning (ML)-assisted qualitative methods, illustrating how primary care researchers may use AQUA to rapidly and accurately code large text datasets. The contribution of this article is providing guidance that should increase AI/ML transparency and reproducibility.

Download Full-text

An exploration of text mining of narrative reports of injury incidents to assess risk

MATEC Web of Conferences ◽

10.1051/matecconf/201825106020 ◽

2018 ◽

Vol 251 ◽

pp. 06020 ◽

Cited By ~ 4

Author(s):

David Passmore ◽

Chungil Chae ◽

Yulia Kustikova ◽

Rose Baker ◽

Jeong-Ha Yim

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Surface Mining ◽

Modeling Processes ◽

Free Text ◽

Text Data ◽

Injury Occurrence ◽

The Usa ◽

Musculoskeletal Systems ◽

Topic Mining

A topic model was explored using unsupervised machine learning to summarized free-text narrative reports of 77,215 injuries that occurred in coal mines in the USA between 2000 and 2015. Latent Dirichlet Allocation modeling processes identified six topics from the free-text data. One topic, a theme describing primarily injury incidents resulting in strains and sprains of musculoskeletal systems, revealed differences in topic emphasis by the location of the mine property at which injuries occurred, the degree of injury, and the year of injury occurrence. Text narratives clustered around this topic refer most frequently to surface or other locations rather than underground locations that resulted in disability and that, also, increased secularly over time. The modeling success enjoyed in this exploratory effort suggests that additional topic mining of these injury text narratives is justified, especially using a broad set of covariates to explain variations in topic emphasis and for comparison of surface mining injuries with injuries occurring during site preparation for construction.

Download Full-text

Identifying Medication-related Intents from a Bidirectional Text Messaging Platform for Hypertension Management: An Unsupervised Learning Approach

10.1101/2021.12.23.21268061 ◽

2021 ◽

Author(s):

Anahita Davoudi ◽

Natalie Lee ◽

Thaibinh Luong ◽

Timothy Delaney ◽

Elizabeth Asch ◽

...

Keyword(s):

Blood Pressure ◽

Unsupervised Learning ◽

Language Processing ◽

Text Messaging ◽

Latent Dirichlet Allocation ◽

Clinical Care ◽

Hypertension Management ◽

Free Text ◽

Significant Heterogeneity ◽

Text Data

Background: Free-text communication between patients and providers is playing an increasing role in chronic disease management, through platforms varying from traditional healthcare portals to more novel mobile messaging applications. These text data are rich resources for clinical and research purposes, but their sheer volume render them difficult to manage. Even automated approaches such as natural language processing require labor-intensive manual classification for developing training datasets, which is a rate-limiting step. Automated approaches to organizing free-text data are necessary to facilitate the use of free-text communication for clinical care and research. Objective: We applied unsupervised learning approaches to 1) understand the types of topics discussed and 2) to learn medication-related intents from messages sent between patients and providers through a bi-directional text messaging system for managing participant blood pressure. Methods: This study was a secondary analysis of de-identified messages from a remote mobile text-based employee hypertension management program at an academic institution. In experiment 1, we trained a Latent Dirichlet Allocation (LDA) model for each message type (inbound-patient and outbound-provider) and identified the distribution of major topics and significant topics (probability >0.20) across message types. In experiment 2, we annotated all medication-related messages with a single medication intent. Then, we trained a second LDA model (medLDA) to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n-1-3 words) using spaCy, clinical named entities using STANZA, and medication categories using MedEx, and then applied Chi-square feature selection to learn the most informative features associated with each medication intent. Results: A total of 253 participants and 5 providers engaged in the program generating 12,131 total messages: 47% patient messages and 53% provider messages. Most patient messages correspond to blood pressure (BP) reporting, BP encouragement, and appointment scheduling. In contrast, most provider messages correspond to BP reporting, medication adherence, and confirmatory statements. In experiment 1, for both patient and provider messages, most messages contained 1 topic and few with more than 3 topics identified using LDA. However, manual review of some messages within topics revealed significant heterogeneity even within single-topic messages as identified by LDA. In experiment 2, among the 534 medication messages annotated with a single medication intent, most of the 282 patient medication messages referred to medication request (48%; n=134) and medication taking (28%; n=79); most of the 252 provider medication messages referred to medication question (69%; n=173). Although medLDA could identify a majority intent within each topic, the model could not distinguish medication intents with low prevalence within either patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusion: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitate the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated deep medication intent classification.

Download Full-text

Open-Ended Questions

Employee Surveys and Sensing ◽

10.1093/oso/9780190939717.003.0013 ◽

2020 ◽

pp. 202-218

Author(s):

Subhadra Dutta ◽

Eric M. O’Rourke

Keyword(s):

Machine Learning ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Written Language ◽

Text Data ◽

Employee Survey ◽

Trade Offs ◽

Word Relatedness ◽

Survey Responses

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.

Download Full-text

A Numerically Coded File of Operative Procedures Derived from a Free Text Data Collection System : A Measure of the Accuracy

Methods of Information in Medicine ◽

10.1055/s-0038-1635717 ◽

1976 ◽

Vol 15 (01) ◽

pp. 21-28 ◽

Cited By ~ 3

Author(s):

Carmen A. Scudiero ◽

Ruth L. Wong

Keyword(s):

Data Collection ◽

Pap Smear ◽

Operative Procedures ◽

Free Text ◽

Collection System ◽

Process Data ◽

Text Data ◽

Data Collection System ◽

History Of ◽

Correlation System

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

Predicting adult neuroscience intensive care unit admission from emergency department triage using a retrospective, tabular-free text machine learning approach

Scientific Reports ◽

10.1038/s41598-021-80985-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Eyal Klang ◽

Benjamin R. Kummer ◽

Neha S. Dangayach ◽

Amy Zhong ◽

M. Arash Kia ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Emergency Department ◽

Intensive Care ◽

Learning Model ◽

Free Text ◽

Combined Model ◽

Text Data ◽

Machine Learning Model ◽

Record Data

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.

Download Full-text

Knowledge needs in the non-profit sector: an evidence-based model of organizational practices

Journal of Knowledge Management ◽

10.1108/jkm-12-2014-0512 ◽

2016 ◽

Vol 20 (1) ◽

pp. 23-48 ◽

Cited By ~ 10

Author(s):

Dinesh Rathi ◽

Lisa M. Given ◽

Eric Forcier

Keyword(s):

Quantitative Data ◽

Qualitative Data ◽

Free Text ◽

Evidence Based ◽

Theory Approach ◽

Organizational Practices ◽

Content Type ◽

Non Profit ◽

Types Of Knowledge ◽

Survey Responses

Purpose – This paper aims to present findings from a study of non-profit organizations (NPOs), including a model of knowledge needs that can be applied by practitioners and scholars to further develop the NPO sector. Design/methodology/approach – A survey was conducted with NPOs operating in Canada and Australia. An analysis of survey responses identified the different types of knowledge essential for each organization. Respondents identified the importance of three pre-determined themes (quantitative data) related to knowledge needs, as well as a fourth option, which was a free text box (qualitative data). The quantitative and qualitative data were analyzed using descriptive statistical analyses and a grounded theory approach, respectively. Findings – Analysis of the quantitative data indicates that NPOs ' needs are comparable in both countries. Analysis of qualitative data identified five major categories and multiple sub-categories representing the types of knowledge needs of NPOs. Major categories are knowledge about management and organizational practices, knowledge about resources, community knowledge, sectoral knowledge and situated knowledge. The paper discusses the results using semantic proximity and presents an emergent, evidence-based knowledge management (KM)-NPO model. Originality/value – The findings contribute to the growing body of literature in the KM domain, and in the understudied research domain related to the knowledge needs and experiences of NPOs. NPOs will find the identified categories and sub-categories useful to undertake KM initiatives within their individual organizations. The study is also unique, as it includes data from two countries, Canada and Australia.

Download Full-text

The motivation and capacity to go ‘above and beyond’: Qualitative analysis of free-text survey responses in the M@NGO randomised controlled trial of caseload midwifery

Midwifery ◽

10.1016/j.midw.2017.03.012 ◽

2017 ◽

Vol 50 ◽

pp. 148-156 ◽

Cited By ~ 12

Author(s):

Jyai Allen ◽

Sue Kildea ◽

Donna L. Hartz ◽

Mark Tracy ◽

Sally Tracy

Keyword(s):

Randomised Controlled Trial ◽

Qualitative Analysis ◽

Controlled Trial ◽

Free Text ◽

Caseload Midwifery ◽

Randomised Controlled ◽

Survey Responses

Download Full-text

What are the concerns and goals of women attending a urogynaecology clinic? Content analysis of free-text data from an electronic pelvic floor assessment questionnaire (ePAQ-PF)

International Urogynecology Journal ◽

10.1007/s00192-018-3697-0 ◽

2018 ◽

Vol 30 (1) ◽

pp. 33-41 ◽

Cited By ~ 3

Author(s):

Thomas Gray ◽

Scarlett Strickland ◽

Sarita Pooranawattanakul ◽

Weiguang Li ◽

Patrick Campbell ◽

...

Keyword(s):

Content Analysis ◽

Pelvic Floor ◽

Free Text ◽

Text Data ◽

Assessment Questionnaire

Download Full-text

Use of Electronic Health Record Tools to Facilitate and Audit Infliximab Prescribing

The Journal of Pediatric Pharmacology and Therapeutics ◽

10.5863/1551-6776-23.1.18 ◽

2018 ◽

Vol 23 (1) ◽

pp. 18-25

Author(s):

Bethany R. Sharpless ◽

Fernando del Rosario ◽

Zarela Molle-Rios ◽

Elora Hilmas

Keyword(s):

Literature Review ◽

Electronic Health Record ◽

National Survey ◽

Free Text ◽

Health Record ◽

Order Information ◽

Text Data ◽

Review Analysis ◽

Electronic Health ◽

Implementation Data

OBJECTIVES The objective of this project was to assess a pediatric institution's use of infliximab and develop and evaluate electronic health record tools to improve safety and efficiency of infliximab ordering through auditing and improved communication. METHODS Best use of infliximab was defined through a literature review, analysis of baseline use of infliximab at our institution, and distribution and analysis of a national survey. Auditing and order communication were optimized through implementation of mandatory indications in the infliximab orderable and creation of an interactive flowsheet that collects discrete and free-text data. The value of the implemented electronic health record tools was assessed at the conclusion of the project. RESULTS Baseline analysis determined that 93.8% of orders were dosed appropriately according to the findings of a literature review. After implementation of the flowsheet and indications, the time to perform an audit of use was reduced from 60 minutes to 5 minutes per month. Four months post implementation, data were entered by 60% of the pediatric gastroenterologists at our institution on 15.3% of all encounters for infliximab. Users were surveyed on the value of the tools, with 100% planning to continue using the workflow, and 82% stating the tools frequently improve the efficiency and safety of infliximab prescribing. CONCLUSIONS Creation of a standard workflow by using an interactive flowsheet has improved auditing ability and facilitated the communication of important order information surrounding infliximab. Providers and pharmacists feel these tools improve the safety and efficiency of infliximab ordering, and auditing data reveal that the tools are being used.

Download Full-text

Cross-sectional analysis of women in neurosurgery: a Canadian perspective

Neurosurgical FOCUS ◽

10.3171/2020.12.focus20959 ◽

2021 ◽

Vol 50 (3) ◽

pp. E13

Author(s):

Catherine Veilleux ◽

Nardin Samuel ◽

Han Yan ◽

Victoria Bass ◽

Rabab Al-Shahrani ◽

...

Keyword(s):

Career Success ◽

Career Advancement ◽

Fellowship Training ◽

Steady Increase ◽

Free Text ◽

Enabling Factors ◽

Related Factors ◽

Cross Sectional ◽

Quantitative Analyses ◽

Survey Responses

OBJECTIVEAlthough the past decades have seen a steady increase of women in medicine in general, women continue to represent a minority of the physician-training staff and workforce in neurosurgery in Canada and worldwide. As such, the aim of this study was to analyze the experiences of women faculty practicing neurosurgery across Canada to better understand and address the factors contributing to this disparity.METHODSA historical, cross-sectional, and mixed-method analysis of survey responses was performed using survey results obtained from women attending neurosurgeons across Canada. A web-based survey platform was utilized to collect responses. Quantitative analyses were performed on the responses from the study questionnaire, including summary and comparative statistics. Qualitative analyses of free-text responses were performed using axial and open coding.RESULTSA total of 19 of 31 respondents (61.3%) completed the survey. Positive enabling factors for career success included supportive colleagues and work environment (52.6%); academic accomplishments, including publications and advanced degrees (36.8%); and advanced fellowship training (47.4%). Perceived barriers reported included inequalities with regard to career advancement opportunities (57.8%), conflicting professional and personal interests (57.8%), and lack of mentorship (36.8%). Quantitative analyses demonstrated emerging themes of an increased need for women mentors as well as support and recognition of the contributions to career advancement of personal and family-related factors.CONCLUSIONSThis study represents, to the authors’ knowledge, the first analysis of factors influencing career success and satisfaction in women neurosurgeons across Canada. This study highlights several key factors contributing to the low representation of women in neurosurgery and identifies specific actionable items that can be addressed by training programs and institutions. In particular, female mentorship, opportunities for career advancement, and increased recognition and integration of personal and professional roles were highlighted as areas for future intervention. These findings will provide a framework for addressing these factors and improving the recruitment and retention of females in this specialty.

Download Full-text