Discovering the Predictive Value of Clinical Notes: Machine Learning Analysis with Text Representation

2020 ◽  
Vol 10 (12) ◽  
pp. 2869-2875
Author(s):  
Kareen Teo ◽  
Ching Wai Yong ◽  
Joon Huang Chuah ◽  
Belinda Pingguan Murphy ◽  
Khin Wee Lai

Hospital readmission shortly after discharge is threatening to plague the quality of inpatient care. Readmission is a severe episode that leads to increased medical care costs. Federal regulations and early readmission penalties have created an incentive for healthcare facilities to reduce their readmission rates by predicting patients at a high risk of readmission. Scientists have developed prediction models by using rule-based assessment scores and traditional statistical methods, and most have focused on structured patient records. Recently, a few researchers utilized unstructured clinical notes. However, they achieved moderate prediction accuracy by making predictions of a single diagnosis subpopulation via extensive feature engineering. This study proposes the use of machine learning to learn deep representation of patient notes for the identification of high-risk readmission in a hospital-wide population. We describe and train several predictive models (standard machine learning and neural network), to which several setups have not been applied. Results show that complex deep learning models significantly outperform (P < 0.001) conventionally applied simple models in terms of discrimination ability. We also demonstrate a simple feature evaluation using a standard model, which allows the determination of potential clinical conditions/procedures for targeting. Unlike modeling using structured patient information with considerable variability in structure when different templates or databases are adopted, this study shows that the machine learning approach can be applied to prognosticate readmission with clinical free text in various healthcare settings. Using minimum feature engineering, the trained models perform comparably well or better than other predictive models established in previous literature.

10.2196/19761 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e19761
Author(s):  
Ramin Mohammadi ◽  
Sarthak Jain ◽  
Amir T Namin ◽  
Melissa Scholem Heller ◽  
Ramya Palacholla ◽  
...  

Background Total joint replacements are high-volume and high-cost procedures that should be monitored for cost and quality control. Models that can identify patients at high risk of readmission might help reduce costs by suggesting who should be enrolled in preventive care programs. Previous models for risk prediction have relied on structured data of patients rather than clinical notes in electronic health records (EHRs). The former approach requires manual feature extraction by domain experts, which may limit the applicability of these models. Objective This study aims to develop and evaluate a machine learning model for predicting the risk of 30-day readmission following knee and hip arthroplasty procedures. The input data for these models come from raw EHRs. We empirically demonstrate that unstructured free-text notes contain a reasonably predictive signal for this task. Methods We performed a retrospective analysis of data from 7174 patients at Partners Healthcare collected between 2006 and 2016. These data were split into train, validation, and test sets. These data sets were used to build, validate, and test models to predict unplanned readmission within 30 days of hospital discharge. The proposed models made predictions on the basis of clinical notes, obviating the need for performing manual feature extraction by domain and machine learning experts. The notes that served as model inputs were written by physicians, nurses, pathologists, and others who diagnose and treat patients and may have their own predictions, even if these are not recorded. Results The proposed models output readmission risk scores (propensities) for each patient. The best models (as selected on a development set) yielded an area under the receiver operating characteristic curve of 0.846 (95% CI 82.75-87.11) for hip and 0.822 (95% CI 80.94-86.22) for knee surgery, indicating reasonable discriminative ability. Conclusions Machine learning models can predict which patients are at a high risk of readmission within 30 days following hip and knee arthroplasty procedures on the basis of notes in EHRs with reasonable discriminative power. Following further validation and empirical demonstration that the models realize predictive performance above that which clinical judgment may provide, such models may be used to build an automated decision support tool to help caretakers identify at-risk patients.


2020 ◽  
Author(s):  
Ramin Mohammadi ◽  
Sarthak Jain ◽  
Amir T Namin ◽  
Melissa Scholem Heller ◽  
Ramya Palacholla ◽  
...  

BACKGROUND Total joint replacements are high-volume and high-cost procedures that should be monitored for cost and quality control. Models that can identify patients at high risk of readmission might help reduce costs by suggesting who should be enrolled in preventive care programs. Previous models for risk prediction have relied on structured data of patients rather than clinical notes in electronic health records (EHRs). The former approach requires manual feature extraction by domain experts, which may limit the applicability of these models. OBJECTIVE This study aims to develop and evaluate a machine learning model for predicting the risk of 30-day readmission following knee and hip arthroplasty procedures. The input data for these models come from raw EHRs. We empirically demonstrate that unstructured free-text notes contain a reasonably predictive signal for this task. METHODS We performed a retrospective analysis of data from 7174 patients at Partners Healthcare collected between 2006 and 2016. These data were split into train, validation, and test sets. These data sets were used to build, validate, and test models to predict unplanned readmission within 30 days of hospital discharge. The proposed models made predictions on the basis of clinical notes, obviating the need for performing manual feature extraction by domain and machine learning experts. The notes that served as model inputs were written by physicians, nurses, pathologists, and others who diagnose and treat patients and may have their own predictions, even if these are not recorded. RESULTS The proposed models output readmission risk scores (propensities) for each patient. The best models (as selected on a development set) yielded an area under the receiver operating characteristic curve of 0.846 (95% CI 82.75-87.11) for hip and 0.822 (95% CI 80.94-86.22) for knee surgery, indicating reasonable discriminative ability. CONCLUSIONS Machine learning models can predict which patients are at a high risk of readmission within 30 days following hip and knee arthroplasty procedures on the basis of notes in EHRs with reasonable discriminative power. Following further validation and empirical demonstration that the models realize predictive performance above that which clinical judgment may provide, such models may be used to build an automated decision support tool to help caretakers identify at-risk patients.


Author(s):  
Yingjun Shen ◽  
Zhe Song ◽  
Andrew Kusiak

Abstract Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, the testing scores of KNN, CART and DNN algorithm are increased by 44.78%, 32.72% and 9.13% with our proposed process. We demonstrated the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow “Less is More” philosophy.


2021 ◽  
Vol 1 ◽  
Author(s):  
Attayeb Mohsen ◽  
Lokesh P. Tripathi ◽  
Kenji Mizuguchi

Machine learning techniques are being increasingly used in the analysis of clinical and omics data. This increase is primarily due to the advancements in Artificial intelligence (AI) and the build-up of health-related big data. In this paper we have aimed at estimating the likelihood of adverse drug reactions or events (ADRs) in the course of drug discovery using various machine learning methods. We have also described a novel machine learning-based framework for predicting the likelihood of ADRs. Our framework combines two distinct datasets, drug-induced gene expression profiles from Open TG–GATEs (Toxicogenomics Project–Genomics Assisted Toxicity Evaluation Systems) and ADR occurrence information from FAERS (FDA [Food and Drug Administration] Adverse Events Reporting System) database, and can be applied to many different ADRs. It incorporates data filtering and cleaning as well as feature selection and hyperparameters fine tuning. Using this framework with Deep Neural Networks (DNN), we built a total of 14 predictive models with a mean validation accuracy of 89.4%, indicating that our approach successfully and consistently predicted ADRs for a wide range of drugs. As case studies, we have investigated the performances of our prediction models in the context of Duodenal ulcer and Hepatitis fulminant, highlighting mechanistic insights into those ADRs. We have generated predictive models to help to assess the likelihood of ADRs in testing novel pharmaceutical compounds. We believe that our findings offer a promising approach for ADR prediction and will be useful for researchers in drug discovery.


Author(s):  
Hui Li ◽  
Juyang Jiao ◽  
Shutao Zhang ◽  
Haozheng Tang ◽  
Xinhua Qu ◽  
...  

AbstractThe purpose of this study was to develop a predictive model for length of stay (LOS) after total knee arthroplasty (TKA). Between 2013 and 2014, 1,826 patients who underwent TKA from a single Singapore center were enrolled in the study after qualification. Demographics of patients with normal and prolonged LOS were analyzed. The risk variables that could affect LOS were identified by univariate analysis. Predictive models for LOS after TKA by logistic regression or machine learning were constructed and compared. The univariate analysis showed that age, American Society of Anesthesiologist level, diabetes, ischemic heart disease, congestive heart failure, general anesthesia, and operation duration were risk factors that could affect LOS (p < 0.05). Comparing with logistic regression models, the machine learning model with all variables was the best model to predict LOS after TKA, of whose area of operator characteristic curve was 0.738. Machine learning algorithms improved the predictive performance of LOS prediction models for TKA patients.


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0217639 ◽  
Author(s):  
Jun Su Jung ◽  
Sung Jin Park ◽  
Eun Young Kim ◽  
Kyoung-Sae Na ◽  
Young Jae Kim ◽  
...  

2021 ◽  
Vol 11 (11) ◽  
pp. 1144
Author(s):  
Sheng-Der Hsu ◽  
En Chao ◽  
Sy-Jou Chen ◽  
Dueng-Yuan Hueng ◽  
Hsiang-Yun Lan ◽  
...  

Traumatic brain injury (TBI) can lead to severe adverse clinical outcomes, including death and disability. Early detection of in-hospital mortality in high-risk populations may enable early treatment and potentially reduce mortality using machine learning. However, there is limited information on in-hospital mortality prediction models for TBI patients admitted to emergency departments. The aim of this study was to create a model that successfully predicts, from clinical measures and demographics, in-hospital mortality in a sample of TBI patients admitted to the emergency department. Of the 4881 TBI patients who were screened at the emergency department at a high-level first aid duty hospital in northern Taiwan, 3331 were assigned in triage to Level I or Level II using the Taiwan Triage and Acuity Scale from January 2008 to June 2018. The most significant predictors of in-hospital mortality in TBI patients were the scores on the Glasgow coma scale, the injury severity scale, and systolic blood pressure in the emergency department admission. This study demonstrated the effective cutoff values for clinical measures when using machine learning to predict in-hospital mortality of patients with TBI. The prediction model has the potential to further accelerate the development of innovative care-delivery protocols for high-risk patients.


2020 ◽  
Vol 6 ◽  
pp. e296
Author(s):  
James P. Bagrow

Non-experts have long made important contributions to machine learning (ML) by contributing training data, and recent work has shown that non-experts can also help with feature engineering by suggesting novel predictive features. However, non-experts have only contributed features to prediction tasks already posed by experienced ML practitioners. Here we study how non-experts can design prediction tasks themselves, what types of tasks non-experts will design, and whether predictive models can be automatically trained on data sourced for their tasks. We use a crowdsourcing platform where non-experts design predictive tasks that are then categorized and ranked by the crowd. Crowdsourced data are collected for top-ranked tasks and predictive models are then trained and evaluated automatically using those data. We show that individuals without ML experience can collectively construct useful datasets and that predictive models can be learned on these datasets, but challenges remain. The prediction tasks designed by non-experts covered a broad range of domains, from politics and current events to health behavior, demographics, and more. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of prediction tasks being proposed. In general, understanding better how non-experts can contribute to ML can further leverage advances in Automatic machine learning and has important implications as ML continues to drive workplace automation.


2021 ◽  
Author(s):  
Xianghao Zhan ◽  
Marie Humbert-Droz ◽  
Pritam Mukherjee ◽  
Olivier Gevaert

AbstractMining the structured data in electronic health records(EHRs) enables many clinical applications while the information in free-text clinical notes often remains untapped. Free-text notes are unstructured data harder to use in machine learning while structured diagnostic codes can be missing or even erroneous. To improve the quality of diagnostic codes, this work extracts structured diagnostic codes from the unstructured notes concerning cardiovascular diseases. Five old and new word embeddings were used to vectorize over 5 million progress notes from Stanford EHR and logistic regression was used to predict eight ICD-10 codes of common cardiovascular diseases. The models were interpreted by the important words in predictions and analyses of false positive cases. Trained on Stanford notes, the model transferability was tested in the prediction of corresponding ICD-9 codes of the MIMIC-III discharge summaries. The word embeddings and logistic regression showed good performance in the diagnostic code extraction with TF-IDF as the best word embedding model showing AU-ROC ranging from 0.9499 to 0.9915 and AUPRC ranging from 0.2956 to 0.8072. The models also showed transferability when tested on MIMIC-III data set with AUROC ranging from 0.7952 to 0.9790 and AUPRC ranging from 0.2353 to 0.8084. Model interpretability was showed by the important words with clinical meanings matching each disease. This study shows the feasibility to accurately extract structured diagnostic codes, impute missing codes and correct erroneous codes from free-text clinical notes with interpretable models for clinicians, which helps improve the data quality of diagnostic codes for information retrieval and downstream machine-learning applications.


2021 ◽  
Author(s):  
Sohrat Baki ◽  
Cenk Temizel ◽  
Serkan Dursun

Abstract Unconventional reservoirs, mainly shale oil and natural gas, will continue to significantly help meet the ever-growing energy demands of global markets. Being complex in nature and having ultra-tight producing zones, unconventionals depends on effective well completion and stimulation treatments in order to be successful and economical. Within the last decade, thousands of unconventional wells have been drilled, completed and produced in North America. The scope of this work is exploring the primary impact of completion parameters such as lateral length, frac type, number of stages, proppant and fluid volume effect on the production performance of the wells in unconventional fields. The key attributes in completion, stimulation, and production for the wells were considered in machine learning workflow for building predictive models. Predictive models based on Neural Networks, Support Vector Machines or Decision Tree Based ensemble models, serves as mapping function from completion parameters to production in each well in the field. The completion parameters were analyzed in the workflow with respect to feature engineering and interpretation. This analysis resulted in key performance indicators for the region. Then the optimum values for the best production performing completions were identified for each well. Predictive models in the workflow were analyzed in accuracy and best model is used to understand the impact of completion parameters on the production rates. This study outlines an overall machine learning workflow, from feature engineering to interpretation of the machine learning models to quantify the effects of completion parameters on the production rate of the wells in unconventional fields


Sign in / Sign up

Export Citation Format

Share Document