Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning

PURPOSE Although immune checkpoint inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies, they are associated with a unique spectrum of side effects termed immune-related adverse events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs. Retrospective analysis of data from electronic health records can provide knowledge to characterize these toxicities. However, such information is not captured in a structured format within the electronic health record and requires manual chart review. MATERIALS AND METHODS In this work, we propose a natural language processing pipeline that can automatically annotate clinical notes and determine whether there is evidence that a patient developed an irAE. Seven hundred eighty-one cases were manually reviewed by clinicians and annotated for irAEs at the patient level. A dictionary of irAEs keywords was used to perform text reduction on clinical notes belonging to each patient; only sentences with relevant expressions were kept. Word embeddings were then used to generate vector representations over the reduced text, which served as input for the machine learning classifiers. The output of the models was presence or absence of any irAEs. Additional models were built to classify skin-related toxicities, endocrine toxicities, and colitis. RESULTS The model for any irAE achieved an average F1-score = 0.75 and area under the receiver operating characteristic curve = 0.85. This outperformed a basic keyword filtering approach. Although the classifier of any irAEs achieved good accuracy, individual irAE classification still has room for improvement. CONCLUSION We demonstrate that patient-level annotations combined with a machine learning approach using keywords filtering and word embeddings can achieve promising accuracy in classifying irAEs in clinical notes. This model may facilitate annotation and analysis of large irAEs data sets.

Download Full-text

Automated Identification of Patients with Immune-related Adverse Events from Clinical Notes using Word embedding and Machine Learning

10.1101/2020.05.19.20106583 ◽

2020 ◽

Author(s):

Samir Gupta ◽

Anas Belouali ◽

Neil J Shah ◽

Michael B Atkins ◽

Subha Madhavan

Keyword(s):

Machine Learning ◽

Adverse Events ◽

Checkpoint Inhibitors ◽

High Accuracy ◽

Automated Identification ◽

Real World Data ◽

Clinical Notes ◽

Safety Research ◽

Advanced Malignancies ◽

Treatment Safety

Immune Checkpoint Inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies. However, ICIs are associated with a unique spectrum of side effects termed Immune-Related Adverse Events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs from real world data (RWD). The goal of this work is to evaluate a Machine Learning-based phenotyping approach that can identify patients with irAEs from a large volume of retrospective clinical notes representing RWD. Evaluation shows promising results with an average F1-score=0.75 and AUC-ROC=0.78. While the extraction of any available irAEs in charts achieves high accuracy, individual irAEs extraction has room for further improvement.

Download Full-text

Abstract P259: Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records

Circulation ◽

10.1161/circ.141.suppl_1.p259 ◽

2020 ◽

Vol 141 (Suppl_1) ◽

Author(s):

Yiqing Zhao ◽

Sunyang Fu ◽

Suzette J Bielinski ◽

Paul Decker ◽

Alanna M Chamberlain ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Language Processing ◽

Stroke Incidence ◽

Clinical Notes ◽

Feature Sets ◽

Electronic Health

Background: The focus of most existing phenotyping algorithms based on electronic health record (EHR) data has been to accurately identify cases and non-cases of specific diseases. However, a more challenging task is to accurately identify disease incidence, as identifying the first occurrence of disease is more important for efficient and valid clinical and epidemiological research. Moreover, stroke is a challenging phenotype due to diagnosis difficulty and common miscoding. This task generally requires utilization of multiple types of EHR data (e.g., diagnoses and procedure codes, unstructured clinical notes) and a more robust algorithm integrating both natural language processing and machine learning. In this study, we developed and validated an EHR-based classifier to accurately identify stroke incidence among a cohort of atrial fibrillation (AF) patients Methods: We developed a stroke phenotyping algorithm using International Classification of Diseases, Ninth Revision (ICD-9) codes, Current Procedural Terminology (CPT) codes, and expert-provided keywords as model features. Structured data was extracted from Rochester Epidemiology Project (REP) database. Natural Language Processing (NLP) was used to extract and validate keyword occurrence in clinical notes. A window of ±30 days was considered when including/excluding keywords/codes into the input vector. Frequencies of keywords/codes were used as input feature sets for model training. Multiple competing models were trained using various combinations of feature sets and two machine learning algorithms: logistic regression and random forest. Training data were provided by two nurse abstractors and included validated stroke incidences from a previously established atrial fibrillation cohort. Precision, recall, and F-score of the algorithm were calculated to assess and compare model performances. Results: Among 4,914 patients with atrial fibrillation, 1,773 patients were screened. 3,141 patients had no stroke-related codes or keywords and were presumed to be free of stroke during follow-up. Among the screened patients, 740 had validated strokes and 1,033 did not have a stroke based on review of the EHR by trained nurse abstractors. The best performing stroke incidence phenotyping classifier utilized Keywords+ICD-9+CPT features using a random forest classifier, achieving a precision of 0.942, recall of 0.943, and F-score of 0.943. Conclusion: In conclusion, we developed and validated a stroke algorithm that performed well for identifying stroke incidence in an enriched population (AF cohort), which extends beyond the typical binary case/non-case stroke identification problem. Future work will involve testing the generalizability of this algorithm in a general population.

Download Full-text

Acronym Disambiguation in Clinical Notes from Electronic Health Records

10.1101/2020.11.25.20221648 ◽

2020 ◽

Author(s):

Nicholas B. Link ◽

Selena Huang ◽

Tianrun Cai ◽

Zeling He ◽

Jiehuan Sun ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Electronic Health Records ◽

Language Processing ◽

Learning Approaches ◽

Health Records ◽

Hospital System ◽

Clinical Notes ◽

Unsupervised Method ◽

Electronic Health

ABSTRACTObjectiveThe use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce an unsupervised method for acronym disambiguation, the task of classifying the correct sense of acronyms in the clinical EHR notes.MethodsWe developed an unsupervised ensemble machine learning (CASEml) algorithm to automatically classify acronyms by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard unsupervised method and a baseline metric selecting the most frequent acronym sense. We additionally evaluated the effects of RA disambiguation on NLP-driven phenotyping of rheumatoid arthritis.ResultsCASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art unsupervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.ConclusionCASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and unsupervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

Download Full-text

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy166 ◽

2019 ◽

Vol 26 (3) ◽

pp. 254-261 ◽

Cited By ~ 12

Author(s):

Majid Afshar ◽

Andrew Phillips ◽

Niranjan Karnik ◽

Jeanne Mueller ◽

Daniel To ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Language Processing ◽

Alcohol Misuse ◽

Health Record ◽

Trauma Patients ◽

Clinical Notes ◽

Electronic Health

AbstractObjectiveAlcohol misuse is present in over a quarter of trauma patients. Information in the clinical notes of the electronic health record of trauma patients may be used for phenotyping tasks with natural language processing (NLP) and supervised machine learning. The objective of this study is to train and validate an NLP classifier for identifying patients with alcohol misuse.Materials and MethodsAn observational cohort of 1422 adult patients admitted to a trauma center between April 2013 and November 2016. Linguistic processing of clinical notes was performed using the clinical Text Analysis and Knowledge Extraction System. The primary analysis was the binary classification of alcohol misuse. The Alcohol Use Disorders Identification Test served as the reference standard.ResultsThe data corpus comprised 91 045 electronic health record notes and 16 091 features. In the final machine learning classifier, 16 features were selected from the first 24 hours of notes for identifying alcohol misuse. The classifier’s performance in the validation cohort had an area under the receiver-operating characteristic curve of 0.78 (95% confidence interval [CI], 0.72 to 0.85). Sensitivity and specificity were at 56.0% (95% CI, 44.1% to 68.0%) and 88.9% (95% CI, 84.4% to 92.8%). The Hosmer-Lemeshow goodness-of-fit test demonstrates the classifier fits the data well (P = .17). A simpler rule-based keyword approach had a decrease in sensitivity when compared with the NLP classifier from 56.0% to 18.2%.ConclusionsThe NLP classifier has adequate predictive validity for identifying alcohol misuse in trauma centers. External validation is needed before its application to augment screening.

Download Full-text

Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks

Scientific Reports ◽

10.1038/s41598-021-88027-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aditi S. Krishnapriyan ◽

Joseph Montoya ◽

Maciej Haranczyk ◽

Jens Hummelshøj ◽

Dmitriy Morozov

Keyword(s):

Machine Learning ◽

Language Processing ◽

Metal Organic Framework ◽

Persistent Homology ◽

Level Structure ◽

Chemical Information ◽

Material System ◽

Word Embeddings ◽

Structure Property ◽

Metal Organic

AbstractMachine learning has emerged as a powerful approach in materials discovery. Its major challenge is selecting features that create interpretable representations of materials, useful across multiple prediction tasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material’s structure and chemistry. This approach builds on computational topology techniques (namely, persistent homology) and word embeddings from natural language processing. It automatically encapsulates geometric and chemical information directly from the material system. We demonstrate our approach on multiple nanoporous metal–organic framework datasets by predicting methane and carbon dioxide adsorption across different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features, consistently achieving an average 25–30% decrease in root-mean-squared-deviation and an average increase of 40–50% in R2 scores. A key advantage of our approach is interpretability: Our model identifies the pores that correlate best to adsorption at different pressures, which contributes to understanding atomic-level structure–property relationships for materials design.

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study (Preprint)

10.2196/preprints.17819 ◽

2020 ◽

Author(s):

Christopher A Hane ◽

Vijay S Nori ◽

William H Crown ◽

Darshak M Sanghavi ◽

Paul Bleicher

Keyword(s):

Machine Learning ◽

Language Processing ◽

Disease Onset ◽

Area Under The Curve ◽

Learning Models ◽

Term Care ◽

Clinical Notes ◽

Patients At Risk ◽

Hospital Systems ◽

Machine Learning Models

BACKGROUND Clinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis. OBJECTIVE This study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD. METHODS We used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians. RESULTS When using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes’ volume was largest; results are mixed in years 7 and 8 with the smallest cohorts. CONCLUSIONS Although clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy.

Download Full-text

A Time-Updated, Parsimonious Model to Predict AKI in Hospitalized Children

Journal of the American Society of Nephrology ◽

10.1681/asn.2019070745 ◽

2020 ◽

Vol 31 (6) ◽

pp. 1348-1357 ◽

Cited By ~ 1

Author(s):

Ibrahim Sandokji ◽

Yu Yamamoto ◽

Aditya Biswas ◽

Tanima Arora ◽

Ugochukwu Ugwuowo ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Health Record ◽

Hospitalized Children ◽

Operating Characteristic Curve ◽

Electronic Health

BackgroundTimely prediction of AKI in children can allow for targeted interventions, but the wealth of data in the electronic health record poses unique modeling challenges.MethodsWe retrospectively reviewed the electronic medical records of all children younger than 18 years old who had at least two creatinine values measured during a hospital admission from January 2014 through January 2018. We divided the study population into derivation, and internal and external validation cohorts, and used five feature selection techniques to select 10 of 720 potentially predictive variables from the electronic health records. Model performance was assessed by the area under the receiver operating characteristic curve in the validation cohorts. The primary outcome was development of AKI (per the Kidney Disease Improving Global Outcomes creatinine definition) within a moving 48-hour window. Secondary outcomes included severe AKI (stage 2 or 3), inpatient mortality, and length of stay.ResultsAmong 8473 encounters studied, AKI occurred in 516 (10.2%), 207 (9%), and 27 (2.5%) encounters in the derivation, and internal and external validation cohorts, respectively. The highest-performing model used a machine learning-based genetic algorithm, with an overall receiver operating characteristic curve in the internal validation cohort of 0.76 [95% confidence interval (CI), 0.72 to 0.79] for AKI, 0.79 (95% CI, 0.74 to 0.83) for severe AKI, and 0.81 (95% CI, 0.77 to 0.86) for neonatal AKI. To translate this prediction model into a clinical risk-stratification tool, we identified high- and low-risk threshold points.ConclusionsUsing various machine learning algorithms, we identified and validated a time-updated prediction model of ten readily available electronic health record variables to accurately predict imminent AKI in hospitalized children.

Download Full-text

Artificial Intelligence in the Intensive Care Unit

Seminars in Respiratory and Critical Care Medicine ◽

10.1055/s-0040-1719037 ◽

2020 ◽

Author(s):

Massimiliano Greco ◽

Pier F. Caruso ◽

Maurizio Cecconi

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Intensive Care ◽

Language Processing ◽

Laboratory Data ◽

Kidney Injury ◽

Support Vector ◽

Learning Models ◽

Electronic Health ◽

Natural Terrain

AbstractThe diffusion of electronic health records collecting large amount of clinical, monitoring, and laboratory data produced by intensive care units (ICUs) is the natural terrain for the application of artificial intelligence (AI). AI has a broad definition, encompassing computer vision, natural language processing, and machine learning, with the latter being more commonly employed in the ICUs. Machine learning may be divided in supervised learning models (i.e., support vector machine [SVM] and random forest), unsupervised models (i.e., neural networks [NN]), and reinforcement learning. Supervised models require labeled data that is data mapped by human judgment against predefined categories. Unsupervised models, on the contrary, can be used to obtain reliable predictions even without labeled data. Machine learning models have been used in ICU to predict pathologies such as acute kidney injury, detect symptoms, including delirium, and propose therapeutic actions (vasopressors and fluids in sepsis). In the future, AI will be increasingly used in ICU, due to the increasing quality and quantity of available data. Accordingly, the ICU team will benefit from models with high accuracy that will be used for both research purposes and clinical practice. These models will be also the foundation of future decision support system (DSS), which will help the ICU team to visualize and analyze huge amounts of information. We plea for the creation of a standardization of a core group of data between different electronic health record systems, using a common dictionary for data labeling, which could greatly simplify sharing and merging of data from different centers.

Download Full-text

Predicting Unplanned Readmissions Following a Hip or Knee Arthroplasty: Retrospective Observational Study

JMIR Medical Informatics ◽

10.2196/19761 ◽

2020 ◽

Vol 8 (11) ◽

pp. e19761

Author(s):

Ramin Mohammadi ◽

Sarthak Jain ◽

Amir T Namin ◽

Melissa Scholem Heller ◽

Ramya Palacholla ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

High Risk ◽

Knee Arthroplasty ◽

Characteristic Curve ◽

Risk Scores ◽

Unplanned Readmission ◽

Free Text ◽

Discriminative Ability ◽

Clinical Notes

Background Total joint replacements are high-volume and high-cost procedures that should be monitored for cost and quality control. Models that can identify patients at high risk of readmission might help reduce costs by suggesting who should be enrolled in preventive care programs. Previous models for risk prediction have relied on structured data of patients rather than clinical notes in electronic health records (EHRs). The former approach requires manual feature extraction by domain experts, which may limit the applicability of these models. Objective This study aims to develop and evaluate a machine learning model for predicting the risk of 30-day readmission following knee and hip arthroplasty procedures. The input data for these models come from raw EHRs. We empirically demonstrate that unstructured free-text notes contain a reasonably predictive signal for this task. Methods We performed a retrospective analysis of data from 7174 patients at Partners Healthcare collected between 2006 and 2016. These data were split into train, validation, and test sets. These data sets were used to build, validate, and test models to predict unplanned readmission within 30 days of hospital discharge. The proposed models made predictions on the basis of clinical notes, obviating the need for performing manual feature extraction by domain and machine learning experts. The notes that served as model inputs were written by physicians, nurses, pathologists, and others who diagnose and treat patients and may have their own predictions, even if these are not recorded. Results The proposed models output readmission risk scores (propensities) for each patient. The best models (as selected on a development set) yielded an area under the receiver operating characteristic curve of 0.846 (95% CI 82.75-87.11) for hip and 0.822 (95% CI 80.94-86.22) for knee surgery, indicating reasonable discriminative ability. Conclusions Machine learning models can predict which patients are at a high risk of readmission within 30 days following hip and knee arthroplasty procedures on the basis of notes in EHRs with reasonable discriminative power. Following further validation and empirical demonstration that the models realize predictive performance above that which clinical judgment may provide, such models may be used to build an automated decision support tool to help caretakers identify at-risk patients.

Download Full-text