scholarly journals Structuring clinical text with AI: old vs. new natural language processing techniques evaluated on eight common cardiovascular diseases

Author(s):  
Xianghao Zhan ◽  
Marie Humbert-Droz ◽  
Pritam Mukherjee ◽  
Olivier Gevaert

AbstractMining the structured data in electronic health records(EHRs) enables many clinical applications while the information in free-text clinical notes often remains untapped. Free-text notes are unstructured data harder to use in machine learning while structured diagnostic codes can be missing or even erroneous. To improve the quality of diagnostic codes, this work extracts structured diagnostic codes from the unstructured notes concerning cardiovascular diseases. Five old and new word embeddings were used to vectorize over 5 million progress notes from Stanford EHR and logistic regression was used to predict eight ICD-10 codes of common cardiovascular diseases. The models were interpreted by the important words in predictions and analyses of false positive cases. Trained on Stanford notes, the model transferability was tested in the prediction of corresponding ICD-9 codes of the MIMIC-III discharge summaries. The word embeddings and logistic regression showed good performance in the diagnostic code extraction with TF-IDF as the best word embedding model showing AU-ROC ranging from 0.9499 to 0.9915 and AUPRC ranging from 0.2956 to 0.8072. The models also showed transferability when tested on MIMIC-III data set with AUROC ranging from 0.7952 to 0.9790 and AUPRC ranging from 0.2353 to 0.8084. Model interpretability was showed by the important words with clinical meanings matching each disease. This study shows the feasibility to accurately extract structured diagnostic codes, impute missing codes and correct erroneous codes from free-text clinical notes with interpretable models for clinicians, which helps improve the data quality of diagnostic codes for information retrieval and downstream machine-learning applications.

2017 ◽  
Vol 1 (S1) ◽  
pp. 12-12
Author(s):  
Jianyin Shao ◽  
Ram Gouripeddi ◽  
Julio C. Facelli

OBJECTIVES/SPECIFIC AIMS: This poster presents a detailed characterization of the distribution of semantic concepts used in the text describing eligibility criteria of clinical trials reported to ClincalTrials.gov and patient notes from MIMIC-III. The final goal of this study is to find a minimal set of semantic concepts that can describe clinical trials and patients for efficient computational matching of clinical trial descriptions to potential participants at large scale. METHODS/STUDY POPULATION: We downloaded the free text describing the eligibility criteria of all clinical trials reported to ClinicalTrials.gov as of July 28, 2015, ~195,000 trials and ~2,000,000 clinical notes from MIMIC-III. Using MetaMap 2014 we extracted UMLS concepts (CUIs) from the collected text. We calculated the frequency of presence of the semantic concepts in the texts describing the clinical trials eligibility criteria and patient notes. RESULTS/ANTICIPATED RESULTS: The results show a classical power distribution, Y=210X(−2.043), R2=0.9599, for clinical trial eligibility criteria and Y=513X(−2.684), R2=0.9477 for MIMIC patient notes, where Y represents the number of documents in which a concept appears and X is the cardinal order the concept ordered from more to less frequent. From this distribution, it is possible to realize that from the over, 100,000 concepts in UMLS, there are only ~60,000 and 50,000 concepts that appear in less than 10 clinical trial eligibility descriptions and MIMIC-III patient clinical notes, respectively. This indicates that it would be possible to describe clinical trials and patient notes with a relatively small number of concepts, making the search space for matching patients to clinical trials a relatively small sub-space of the overall UMLS search space. DISCUSSION/SIGNIFICANCE OF IMPACT: Our results showing that the concepts used to describe clinical trial eligibility criteria and patient clinical notes follow a power distribution can lead to tractable computational approaches to automatically match patients to clinical trials at large scale by considerably reducing the search space. While automatic patient matching is not the panacea for improving clinical trial recruitment, better low cost computational preselection processes can allow the limited human resources assigned to patient recruitment to be redirected to the most promising targets for recruitment.


10.2196/19761 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e19761
Author(s):  
Ramin Mohammadi ◽  
Sarthak Jain ◽  
Amir T Namin ◽  
Melissa Scholem Heller ◽  
Ramya Palacholla ◽  
...  

Background Total joint replacements are high-volume and high-cost procedures that should be monitored for cost and quality control. Models that can identify patients at high risk of readmission might help reduce costs by suggesting who should be enrolled in preventive care programs. Previous models for risk prediction have relied on structured data of patients rather than clinical notes in electronic health records (EHRs). The former approach requires manual feature extraction by domain experts, which may limit the applicability of these models. Objective This study aims to develop and evaluate a machine learning model for predicting the risk of 30-day readmission following knee and hip arthroplasty procedures. The input data for these models come from raw EHRs. We empirically demonstrate that unstructured free-text notes contain a reasonably predictive signal for this task. Methods We performed a retrospective analysis of data from 7174 patients at Partners Healthcare collected between 2006 and 2016. These data were split into train, validation, and test sets. These data sets were used to build, validate, and test models to predict unplanned readmission within 30 days of hospital discharge. The proposed models made predictions on the basis of clinical notes, obviating the need for performing manual feature extraction by domain and machine learning experts. The notes that served as model inputs were written by physicians, nurses, pathologists, and others who diagnose and treat patients and may have their own predictions, even if these are not recorded. Results The proposed models output readmission risk scores (propensities) for each patient. The best models (as selected on a development set) yielded an area under the receiver operating characteristic curve of 0.846 (95% CI 82.75-87.11) for hip and 0.822 (95% CI 80.94-86.22) for knee surgery, indicating reasonable discriminative ability. Conclusions Machine learning models can predict which patients are at a high risk of readmission within 30 days following hip and knee arthroplasty procedures on the basis of notes in EHRs with reasonable discriminative power. Following further validation and empirical demonstration that the models realize predictive performance above that which clinical judgment may provide, such models may be used to build an automated decision support tool to help caretakers identify at-risk patients.


2019 ◽  
Author(s):  
Zachary N. Flamholz ◽  
Lyle H. Ungar ◽  
Gary E. Weissman

AbstractRationaleWord embeddings are used to create vector representations of text data but not all embeddings appropriately capture clinical information, are free of protected health information, and are computationally accessible to most researchers.MethodsWe trained word embeddings on published case reports because their language mimics that of clinical notes, the manuscripts are already de-identified by virtue of being published, and the corpus is much smaller than those trained on large, publicly available datasets. We tested the performance of these embeddings across five clinically relevant tasks and compared the results to embeddings trained on a large Wikipedia corpus, all publicly available manuscripts, notes from the MIMIC-III database using fastText, GloVe, and word2vec, and using different dimensions. Tasks included clinical applications of lexicographic coverage, semantic similarity, clustering purity, linguistic regularity, and mortality prediction.ResultsThe embeddings trained using the published case reports performed as well as if not better on most tasks than those using other corpora. The embeddings trained using all published manuscripts had the most consistent performance across all tasks and required a corpus with 100 times as many tokens as the corpus comprised of only case reports. Embeddings trained on the MIMIC-III dataset had small but marginally better scores on the clustering tasks which was also based on clinical notes from the MIMIC-III dataset. Embeddings trained on the Wikipedia corpus, although containing almost twice as many tokens as all available published manuscripts, performed poorly compared to those trained on medical and clinical corpora.ConclusionWord embeddings trained on freely available published case reports performed well for most clinical task, are free of protected health information, and are small compared to commonly used embeddings trained on larger clinical and non-clinical corpora. The optimal corpus, dimension size, and which embedding model to use for a given task involves tradeoffs in privacy, reproducibility, performance, and computational resources.


2022 ◽  
Vol 12 (1) ◽  
pp. 514
Author(s):  
Raheel Nawaz ◽  
Quanbin Sun ◽  
Matthew Shardlow ◽  
Georgios Kontonatsios ◽  
Naif R. Aljohani ◽  
...  

Students’ evaluation of teaching, for instance, through feedback surveys, constitutes an integral mechanism for quality assurance and enhancement of teaching and learning in higher education. These surveys usually comprise both the Likert scale and free-text responses. Since the discrete Likert scale responses are easy to analyze, they feature more prominently in survey analyses. However, the free-text responses often contain richer, detailed, and nuanced information with actionable insights. Mining these insights is more challenging, as it requires a higher degree of processing by human experts, making the process time-consuming and resource intensive. Consequently, the free-text analyses are often restricted in scale, scope, and impact. To address these issues, we propose a novel automated analysis framework for extracting actionable information from free-text responses to open-ended questions in student feedback questionnaires. By leveraging state-of-the-art supervised machine learning techniques and unsupervised clustering methods, we implemented our framework as a case study to analyze a large-scale dataset of 4400 open-ended responses to the National Student Survey (NSS) at a UK university. These analyses then led to the identification, design, implementation, and evaluation of a series of teaching and learning interventions over a two-year period. The highly encouraging results demonstrate our approach’s validity and broad (national and international) application potential—covering tertiary education, commercial training, and apprenticeship programs, etc., where textual feedback is collected to enhance the quality of teaching and learning.


10.2196/32662 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e32662
Author(s):  
Imjin Ahn ◽  
Hansle Gwon ◽  
Heejun Kang ◽  
Yunha Kim ◽  
Hyeram Seo ◽  
...  

Background Effective resource management in hospitals can improve the quality of medical services by reducing labor-intensive burdens on staff, decreasing inpatient waiting time, and securing the optimal treatment time. The use of hospital processes requires effective bed management; a stay in the hospital that is longer than the optimal treatment time hinders bed management. Therefore, predicting a patient’s hospitalization period may support the making of judicious decisions regarding bed management. Objective First, this study aims to develop a machine learning (ML)–based predictive model for predicting the discharge probability of inpatients with cardiovascular diseases (CVDs). Second, we aim to assess the outcome of the predictive model and explain the primary risk factors of inpatients for patient-specific care. Finally, we aim to evaluate whether our ML-based predictive model helps manage bed scheduling efficiently and detects long-term inpatients in advance to improve the use of hospital processes and enhance the quality of medical services. Methods We set up the cohort criteria and extracted the data from CardioNet, a manually curated database that specializes in CVDs. We processed the data to create a suitable data set by reindexing the date-index, integrating the present features with past features from the previous 3 years, and imputing missing values. Subsequently, we trained the ML-based predictive models and evaluated them to find an elaborate model. Finally, we predicted the discharge probability within 3 days and explained the outcomes of the model by identifying, quantifying, and visualizing its features. Results We experimented with 5 ML-based models using 5 cross-validations. Extreme gradient boosting, which was selected as the final model, accomplished an average area under the receiver operating characteristic curve score that was 0.865 higher than that of the other models (ie, logistic regression, random forest, support vector machine, and multilayer perceptron). Furthermore, we performed feature reduction, represented the feature importance, and assessed prediction outcomes. One of the outcomes, the individual explainer, provides a discharge score during hospitalization and a daily feature influence score to the medical team and patients. Finally, we visualized simulated bed management to use the outcomes. Conclusions In this study, we propose an individual explainer based on an ML-based predictive model, which provides the discharge probability and relative contributions of individual features. Our model can assist medical teams and patients in identifying individual and common risk factors in CVDs and can support hospital administrators in improving the management of hospital beds and other resources.


Prediction is a conjecture about something which may happen. Prediction need not be based upon the previous knowledge or experience on the unknown event of interest in the future. But it is a necessity for mankind to foresee and make the right decisions to live better. Every person does predictions but the quality of the predictions differs and that differentiates successful persons and unsuccessful persons. In order to automate the prediction process and to make quality predictions available to every person, machines are trained to make predictions and such field comes under machine learning and later on deep learning algorithms. Various fields such as health care, weather forecasting, natural calamities, and crime prediction are some of the applications of prediction. The researchers have applied the field of prediction to see whether a model can predict the employability of a candidate in a recruitment process. Organizations use human expertise to identify a skilled candidate for employment based on various factors and now these organizations are trying to migrate to automated systems by harnessing the benefits of the exponential growth in the area of machine learning and deep learning. This investigation presents the development of a model to predict the employability by using Logistic Regression. A set of candidates was tested in the proposed model and results are discussed in this paper.


2019 ◽  
Author(s):  
Michelle Odlum ◽  
Omar Sims ◽  
Sunmoo Yoon

BACKGROUND As people living with HIV age, it becomes increasingly important to understand aging-related outcomes. The analysis of electronic health data (EHR) can further the understanding of such outcomes to support HIV aging phenotype development and improved overall health. OBJECTIVE For further insight, we evaluated the performance of two machine learning models: deep learning, and logistic regression on electronic health record data, to identify predictors of medical resource utilization, represented by Charlson comorbidity scores. Diagnostic codes that comprise the factors of individual characteristics, chronic conditions, treatment, and high-risk behaviors served as predictors. METHODS Diagnostic codes (ICD 9/10) were extracted for HIV infected (N=786), and uninfected (N=100,000) patients. A data mining process was applied to build comorbidity prediction models for use with two machine learning algorithms: deep learning and logistic regression. Final models were based on the relationship strength between the outcome (Charlson score: high (>5), low (≤5), and predictors (diagnostic codes). RESULTS Mean Charlson scores were 7.45±4.01 for HIV infected and 3.18±3.3 for uninfected patients. Top diagnostic codes were chemotherapy, hypertension, heart failure and acute kidney disease for HIV infected and substance abuse, length of hospital stay and chemotherapy for uninfected patients. Deep learning model predictors for HIV infected were age (16.16%), chemotherapy (13.17%), noncompliance treatment/regimen (11.83%), and hypertension (10.52%). For uninfected were, age (91.39%), and substance abuse (4.99%). Machine learning logistic regression predictors for HIV infected were chemotherapy (30.3%; OR: 48.7), age (26.6%; OR: 1.03), malnutrition (15.8%; OR: 4.58), and heart failure (10.8%; OR: 4.18). For uninfected were, age (88.7%; OR: 0.89), and length of hospital stay (9.74%; OR: 0.97). CONCLUSIONS Differences were observed for medical resource utilization by HIV status, and predictive models. Results contribute to the development of narrower HIV and aging phenotypes with greater clinical validity to improve interventions for optimal aging-related outcomes.


2019 ◽  
Vol 27 (2) ◽  
pp. 294-300 ◽  
Author(s):  
Alon Geva ◽  
Steven H Abman ◽  
Shannon F Manzi ◽  
Dunbar D Ivy ◽  
Mary P Mullen ◽  
...  

Abstract Objective Real-world data (RWD) are increasingly used for pharmacoepidemiology and regulatory innovation. Our objective was to compare adverse drug event (ADE) rates determined from two RWD sources, electronic health records and administrative claims data, among children treated with drugs for pulmonary hypertension. Materials and Methods Textual mentions of medications and signs/symptoms that may represent ADEs were identified in clinical notes using natural language processing. Diagnostic codes for the same signs/symptoms were identified in our electronic data warehouse for the patients with textual evidence of taking pulmonary hypertension-targeted drugs. We compared rates of ADEs identified in clinical notes to those identified from diagnostic code data. In addition, we compared putative ADE rates from clinical notes to those from a healthcare claims dataset from a large, national insurer. Results Analysis of clinical notes identified up to 7-fold higher ADE rates than those ascertained from diagnostic codes. However, certain ADEs (eg, hearing loss) were more often identified in diagnostic code data. Similar results were found when ADE rates ascertained from clinical notes and national claims data were compared. Discussion While administrative claims and clinical notes are both increasingly used for RWD-based pharmacovigilance, ADE rates substantially differ depending on data source. Conclusion Pharmacovigilance based on RWD may lead to discrepant results depending on the data source analyzed. Further work is needed to confirm the validity of identified ADEs, to distinguish them from disease effects, and to understand tradeoffs in sensitivity and specificity between data sources.


2012 ◽  
Vol 21 (04) ◽  
pp. 1240015
Author(s):  
FEDOR ZHDANOV ◽  
YURI KALNISHKAN

Multi-class classification is one of the most important tasks in machine learning. In this paper we consider two online multi-class classification problems: classification by a linear model and by a kernelized model. The quality of predictions is measured by the Brier loss function. We obtain two computationally efficient algorithms for these problems by applying the Aggregating Algorithms to certain pools of experts and prove theoretical guarantees on the losses of these algorithms. We kernelize one of the algorithms and prove theoretical guarantees on its loss. We perform experiments and compare our algorithms with logistic regression.


2018 ◽  
Vol 39 (2) ◽  
pp. 545-578 ◽  
Author(s):  
Raghu Bollapragada ◽  
Richard H Byrd ◽  
Jorge Nocedal

Abstract The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.


Sign in / Sign up

Export Citation Format

Share Document