scholarly journals On Missingness Features in Machine Learning Models for Critical Care: Observational Study

10.2196/25022 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e25022
Author(s):  
Janmajay Singh ◽  
Masahiro Sato ◽  
Tomoko Ohkuma

Background Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. Objective The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. Methods A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. Results Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. Conclusions This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.

2020 ◽  
Author(s):  
Janmajay Singh ◽  
Masahiro Sato ◽  
Tomoko Ohkuma

BACKGROUND Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. OBJECTIVE The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. METHODS A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. RESULTS Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. CONCLUSIONS This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Suparno Datta ◽  
Jan Philipp Sachs ◽  
Harry FreitasDa Cruz ◽  
Tom Martensen ◽  
Philipp Bode ◽  
...  

Abstract Objectives The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jiaxin Fan ◽  
Mengying Chen ◽  
Jian Luo ◽  
Shusen Yang ◽  
Jinming Shi ◽  
...  

Abstract Background Screening carotid B-mode ultrasonography is a frequently used method to detect subjects with carotid atherosclerosis (CAS). Due to the asymptomatic progression of most CAS patients, early identification is challenging for clinicians, and it may trigger ischemic stroke. Recently, machine learning has shown a strong ability to classify data and a potential for prediction in the medical field. The combined use of machine learning and the electronic health records of patients could provide clinicians with a more convenient and precise method to identify asymptomatic CAS. Methods Retrospective cohort study using routine clinical data of medical check-up subjects from April 19, 2010 to November 15, 2019. Six machine learning models (logistic regression [LR], random forest [RF], decision tree [DT], eXtreme Gradient Boosting [XGB], Gaussian Naïve Bayes [GNB], and K-Nearest Neighbour [KNN]) were used to predict asymptomatic CAS and compared their predictability in terms of the area under the receiver operating characteristic curve (AUCROC), accuracy (ACC), and F1 score (F1). Results Of the 18,441 subjects, 6553 were diagnosed with asymptomatic CAS. Compared to DT (AUCROC 0.628, ACC 65.4%, and F1 52.5%), the other five models improved prediction: KNN + 7.6% (0.704, 68.8%, and 50.9%, respectively), GNB + 12.5% (0.753, 67.0%, and 46.8%, respectively), XGB + 16.0% (0.788, 73.4%, and 55.7%, respectively), RF + 16.6% (0.794, 74.5%, and 56.8%, respectively) and LR + 18.1% (0.809, 74.7%, and 59.9%, respectively). The highest achieving model, LR predicted 1045/1966 cases (sensitivity 53.2%) and 3088/3566 non-cases (specificity 86.6%). A tenfold cross-validation scheme further verified the predictive ability of the LR. Conclusions Among machine learning models, LR showed optimal performance in predicting asymptomatic CAS. Our findings set the stage for an early automatic alarming system, allowing a more precise allocation of CAS prevention measures to individuals probably to benefit most.


2020 ◽  
Author(s):  
Xi Yang ◽  
Qian Li ◽  
Yonghui Wu ◽  
Jiang Bian ◽  
Tianchen Lyu ◽  
...  

AbstractAlzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations.


Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 386
Author(s):  
Sheikh S. Abdullah ◽  
Neda Rostamzadeh ◽  
Kamran Sedig ◽  
Amit X. Garg ◽  
Eric McArthur

Acute kidney injury (AKI) is a common complication in hospitalized patients and can result in increased hospital stay, health-related costs, mortality and morbidity. A number of recent studies have shown that AKI is predictable and avoidable if early risk factors can be identified by analyzing Electronic Health Records (EHRs). In this study, we employ machine learning techniques to identify older patients who have a risk of readmission with AKI to the hospital or emergency department within 90 days after discharge. One million patients’ records are included in this study who visited the hospital or emergency department in Ontario between 2014 and 2016. The predictor variables include patient demographics, comorbid conditions, medications and diagnosis codes. We developed 31 prediction models based on different combinations of two sampling techniques, three ensemble methods, and eight classifiers. These models were evaluated through 10-fold cross-validation and compared based on the AUROC metric. The performances of these models were consistent, and the AUROC ranged between 0.61 and 0.88 for predicting AKI among 31 prediction models. In general, the performances of ensemble-based methods were higher than the cost-sensitive logistic regression. We also validated features that are most relevant in predicting AKI with a healthcare expert to improve the performance and reliability of the models. This study predicts the risk of AKI for a patient after being discharged, which provides healthcare providers enough time to intervene before the onset of AKI.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 679-679
Author(s):  
Limor Appelbaum ◽  
Jose Pablo Cambronero ◽  
Karla Pollick ◽  
George Silva ◽  
Jennifer P. Stevens ◽  
...  

679 Background: Pancreatic Adenocarcinoma (PDAC) is often diagnosed at an advanced stage. We sought to develop a model for early PDAC prediction in the general population, using electronic health records (EHRs) and machine learning. Methods: We used three EHR datasets from Beth-Israel Deaconess Medical Center (BIDMC) and Partners Healthcare (PHC): 1. “BIDMC-Development-Data” (BIDMC-DD) for model development, using a feed-forward neural network (NN) and L2-regularized logistic regression,randomly split (80:20) into training and test groups. We tuned hyperparameters using cross-validation in training, and report performance on the test split. 2. “BIDMC-Large-Data” (BIDMC-LD) to re-fit and calibrate models. 3. “PHC-Data” for external validation. We evaluate using Area Under the Receiver Operating Characteristic Curve (AUC) and compute 95% CI using empirical bootstrap over test data. PDAC patients were selected using ICD9/-10 codes and validated with tumor registries. In contrast to prior work, we did not predefine feature sets based on known clinical correlates and instead employed data-driven feature selection, specifically importance-based feature pruning, regularization, and manual validation, to identify diagnostic-based features. Results: BIDMC-DD included demographics, diagnoses, labs and medications for 1018 patients (cases = 509; age-sex paired controls). BIDMC-LD included diagnoses for 547,917 patients (cases = 509), and PHC included diagnoses for 160,593 patients (cases = 408). We compared our approach to adapted and re-fitted published baselines. With a 365-day lead-time, NN obtained a BIDMC-DD test AUC of 0.84 (CI 0.79 - 0.90) versus the previous best baseline AUC of 0.70 (CI 0.62 - 0.78). We also validated using BIDMC-DD’s test cancer patients and BIDMC LD controls. The AUC was 0.71 (CI 0.67 - 0.76) at the 365-day cutoff. NN’s external validation AUC on PHC-Data was 0.71 (CI 0.63 - 0.79), outperforming an existing model’s AUC of 0.61 (CI 0.52 - 0.70) (Baecker et al, 2019). Conclusions: Models based on data-driven feature selection outperform models that use predefined sets of known clinical correlates and can help in early prediction of PDAC development.


Sign in / Sign up

Export Citation Format

Share Document