scholarly journals Initial Validation of a Machine Learning-Derived Prognostic Test (KidneyIntelX) Integrating Biomarkers and Electronic Health Record Data To Predict Longitudinal Kidney Outcomes

Kidney360 ◽  
2020 ◽  
Vol 1 (8) ◽  
pp. 731-739 ◽  
Author(s):  
Kinsuk Chauhan ◽  
Girish N. Nadkarni ◽  
Fergus Fleming ◽  
James McCullough ◽  
Cijiang J. He ◽  
...  

BackgroundIndividuals with type 2 diabetes (T2D) or the apolipoprotein L1 high-risk (APOL1-HR) genotypes are at increased risk of rapid kidney function decline (RKFD) and kidney failure. We hypothesized that a prognostic test using machine learning integrating blood biomarkers and longitudinal electronic health record (EHR) data would improve risk stratification.MethodsWe selected two cohorts from the Mount Sinai BioMe Biobank: T2D (n=871) and African ancestry with APOL1-HR (n=498). We measured plasma tumor necrosis factor receptors (TNFR) 1 and 2 and kidney injury molecule-1 (KIM-1) and used random forest algorithms to integrate biomarker and EHR data to generate a risk score for a composite outcome: RKFD (eGFR decline of ≥5 ml/min per year), or 40% sustained eGFR decline, or kidney failure. We compared performance to a validated clinical model and applied thresholds to assess the utility of the prognostic test (KidneyIntelX) to accurately stratify patients into risk categories.ResultsOverall, 23% of those with T2D and 18% of those with APOL1-HR experienced the composite kidney end point over a median follow-up of 4.6 and 5.9 years, respectively. The area under the receiver operator characteristic curve (AUC) of KidneyIntelX was 0.77 (95% CI, 0.75 to 0.79) in T2D, and 0.80 (95% CI, 0.77 to 0.83) in APOL1-HR, outperforming the clinical models (AUC, 0.66 [95% CI, 0.65 to 0.67] and 0.72 [95% CI, 0.71 to 0.73], respectively; P<0.001). The positive predictive values for KidneyIntelX were 62% and 62% versus 46% and 39% for the clinical models (P<0.01) in high-risk (top 15%) stratum for T2D and APOL1-HR, respectively. The negative predictive values for KidneyIntelX were 92% in T2D and 96% for APOL1-HR versus 85% and 93% for the clinical model, respectively (P=0.76 and 0.93, respectively), in low-risk stratum (bottom 50%).ConclusionsIn patients with T2D or APOL1-HR, a prognostic test (KidneyIntelX) integrating biomarker levels with longitudinal EHR data significantly improved prediction of a composite kidney end point of RKFD, 40% decline in eGFR, or kidney failure over validated clinical models.

2020 ◽  
Vol 2 ◽  
Author(s):  
Aixia Guo ◽  
Randi E. Foraker ◽  
Robert M. MacGregor ◽  
Faraz M. Masood ◽  
Brian P. Cupps ◽  
...  

Objective: Although many clinical metrics are associated with proximity to decompensation in heart failure (HF), none are individually accurate enough to risk-stratify HF patients on a patient-by-patient basis. The dire consequences of this inaccuracy in risk stratification have profoundly lowered the clinical threshold for application of high-risk surgical intervention, such as ventricular assist device placement. Machine learning can detect non-intuitive classifier patterns that allow for innovative combination of patient feature predictive capability. A machine learning-based clinical tool to identify proximity to catastrophic HF deterioration on a patient-specific basis would enable more efficient direction of high-risk surgical intervention to those patients who have the most to gain from it, while sparing others. Synthetic electronic health record (EHR) data are statistically indistinguishable from the original protected health information, and can be analyzed as if they were original data but without any privacy concerns. We demonstrate that synthetic EHR data can be easily accessed and analyzed and are amenable to machine learning analyses.Methods: We developed synthetic data from EHR data of 26,575 HF patients admitted to a single institution during the decade ending on 12/31/2018. Twenty-seven clinically-relevant features were synthesized and utilized in supervised deep learning and machine learning algorithms (i.e., deep neural networks [DNN], random forest [RF], and logistic regression [LR]) to explore their ability to predict 1-year mortality by five-fold cross validation methods. We conducted analyses leveraging features from prior to/at and after/at the time of HF diagnosis.Results: The area under the receiver operating curve (AUC) was used to evaluate the performance of the three models: the mean AUC was 0.80 for DNN, 0.72 for RF, and 0.74 for LR. Age, creatinine, body mass index, and blood pressure levels were especially important features in predicting death within 1-year among HF patients.Conclusions: Machine learning models have considerable potential to improve accuracy in mortality prediction, such that high-risk surgical intervention can be applied only in those patients who stand to benefit from it. Access to EHR-based synthetic data derivatives eliminates risk of exposure of EHR data, speeds time-to-insight, and facilitates data sharing. As more clinical, imaging, and contractile features with proven predictive capability are added to these models, the development of a clinical tool to assist in timing of intervention in surgical candidates may be possible.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Mark Sonderman ◽  
Eric Farber-Eger ◽  
Aaron W Aday ◽  
Matthew S Freiberg ◽  
Joshua A Beckman ◽  
...  

Introduction: Peripheral arterial disease (PAD) is a common and underdiagnosed disease associated with significant morbidity and increased risk of major adverse cardiovascular events. Targeted screening of individuals at high risk for PAD could facilitate early diagnosis and allow for prompt initiation of interventions aimed at reducing cardiovascular and limb events. However, no widely accepted PAD risk stratification tools exist. Hypothesis: We hypothesized that machine learning algorithms can identify patients at high risk for PAD, defined by ankle-brachial index (ABI) <0.9, from electronic health record (EHR) data. Methods: Using data from the Vanderbilt University Medical Center EHR, ABIs were extracted for 8,093 patients not previously diagnosed with PAD at the time of initial testing. A total of 76 patient characteristics, including demographics, vital signs, lab values, diagnoses, and medications were analyzed using both a random forest and least absolute shrinkage and selection operator (LASSO) regression to identify features most predictive of ABI <0.9. The most significant features were used to build a logistic regression based predictor that was validated in a separate group of individuals with ABI data. Results: The machine learning models identified several features independently correlated with PAD (age, BMI, SBP, DBP, pulse pressure, anti-hypertensive medication, diabetes medication, smoking, and statin use). The test statistic produced by the logistic regression model was correlated with PAD status in our validation set. At a chosen threshold, the specificity was 0.92 and the positive predictive value was 0.73 in this high-risk population. Conclusions: Machine learning can be applied to build unbiased models that identify individuals at risk for PAD using easily accessible information from the EHR. This model can be implemented either through a high-risk flag within the medical record or an online calculator available to clinicians.


2020 ◽  
Author(s):  
Lili Chan ◽  
Girish N. Nadkarni ◽  
Fergus Fleming ◽  
James R. McCullough ◽  
Patti Connolly ◽  
...  

ABSTRACTImportanceDiabetic kidney disease (DKD) is the leading cause of kidney failure in the United States and predicting progression is necessary for improving outcomes.ObjectiveTo develop and validate a machine-learned, prognostic risk score (KidneyIntelX™) combining data from electronic health records (EHR) and circulating biomarkers to predict DKD progression.DesignObservational cohort studySettingTwo EHR linked biobanks: Mount Sinai BioMe Biobank and the Penn Medicine Biobank.ParticipantsPatients with prevalent DKD (G3a-G3b with all grades of albuminuria (A1-A3) and G1 & G2 with A2-A3 level albuminuria) and banked plasma.Main outcomes and measuresPlasma biomarkers soluble tumor necrosis factor 1/2 (sTNFR1, sTNFR2) and kidney injury molecule-1 (KIM-1) were measured at baseline. Patients were divided into derivation [60%] and validation sets [40%]. The composite primary end point, progressive decline in kidney function, including the following: rapid kidney function decline (RKFD) (estimated glomerular filtration rate (eGFR) decline of ≥5 ml/min/1.73m2/year), ≥40% sustained decline, or kidney failure within 5 years. A machine learning model (random forest) was trained and performance assessed using standard metrics.ResultsIn 1146 patients with DKD the median age was 63, 51% were female, median baseline eGFR was 54 ml/min/1.73 m2, urine albumin to creatinine ratio (uACR) was 61 mg/g, and follow-up was 4.3 years. 241 patients (21%) experienced progressive decline in kidney function. On 10-fold cross validation in the derivation set (n=686), the risk model had an area under the curve (AUC) of 0.77 (95% CI 0.74-0.79). In validation (n=460), the AUC was 0.77 (95% CI 0.76-0.79). By comparison, the AUC for an optimized clinical model was 0.62 (95% CI 0.61-0.63) in derivation and 0.61 (95% CI 0.60-0.63) in validation. Using cutoffs from derivation, KidneyIntelX stratified 46%, 37% and 16.5% of validation cohort into low-, intermediate- and high-risk groups, with a positive predictive value (PPV) of 62% (vs. PPV of 37% for the clinical model and 40% for KDIGO; p < 0.001) in the high-risk group and a negative predictive value (NPV) of 91% in the low-risk group. The net reclassification index for events into high-risk group was 41% (p<0.05).Conclusions and RelevanceA machine learned model combining plasma biomarkers and EHR data improved prediction of progressive decline in kidney function within 5 years over KDIGO and standard clinical models in patients with early DKD.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A166-A166
Author(s):  
Nathan Guess ◽  
Henry Fischbach ◽  
Andy Ni ◽  
Allen Firestone

Abstract Introduction The STOP-Bang Questionnaire is a validated instrument to assess an individual’s risk for obstructive sleep apnea (OSA). The prevalence of OSA is estimated at 20% in the US with only 20% of those individuals properly diagnosed. Dentists are being asked to screen and refer patients at high risk for OSA for definitive diagnosis and treatment. The aim of this study was to determine whether patients in a dental school student clinic who were identified as high-risk for OSA, were referred for evaluation of OSA. Methods All new patients over the age of 18 admitted to The Ohio State University - College of Dentistry complete an “Adult Medical History Form”. Included in this study were 21,312 patients admitted between July 2017 and March 2020. Data were extracted from the history form to determine the STOP-Bang Score for all patients: age, sex, BMI, self-reported snoring-, stopped breathing/choking/gasping while sleeping-, high blood pressure-, neck size over 17” (males) or 16” (females)-, and tiredness. Each positive response is a point, for a maximum of 8 points possible. Additionally, any previous diagnosis of sleep apnea, and the patient’s history of referrals were extracted from the health record. According to clinic policy, if the patient did not have a previous diagnosis for OSA noted in the health history, and scored 5 or more on the STOP-Bang Questionnaire, they should receive a referral for an evaluation for OSA. Notes and referral forms were reviewed to determine if the appropriate referrals occurred for patients at high risk without a previous diagnosis. Results Of the 21,312 patients screened; 1098 (5.2%) screened high-risk for OSA, of which 398 had no previous diagnosis of OSA. Of these 398 patients, none (0%) had referrals for further evaluation for OSA. Conclusion The rate of appropriate referrals from a student dental clinic with an electronic health record was unacceptably low. Continued education and changes to the electronic health record are needed to ensure those at high-risk for OSA are appropriately referred and managed. Support (if any):


Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


2021 ◽  
pp. 518-526
Author(s):  
Jennifer H. LeLaurin ◽  
Matthew J. Gurka ◽  
Xiaofei Chi ◽  
Ji-Hyun Lee ◽  
Jaclyn Hall ◽  
...  

PURPOSE Patients with cancer who use tobacco experience reduced treatment effectiveness, increased risk of recurrence and mortality, and diminished quality of life. Accurate tobacco use documentation for patients with cancer is necessary for appropriate clinical decision making and cancer outcomes research. Our aim was to assess agreement between electronic health record (EHR) smoking status data and cancer registry data. MATERIALS AND METHODS We identified all patients with cancer seen at University of Florida Health from 2015 to 2018. Structured EHR smoking status was compared with the tumor registry smoking status for each patient. Sensitivity, specificity, positive predictive values, negative predictive values, and Kappa statistics were calculated. We used logistic regression to determine if patient characteristics were associated with odds of agreement in smoking status between EHR and registry data. RESULTS We analyzed 11,110 patient records. EHR smoking status was documented for nearly all (98%) patients. Overall kappa (0.78; 95% CI, 0.77 to 0.79) indicated moderate agreement between the registry and EHR. The sensitivity was 0.82 (95% CI, 0.81 to 0.84), and the specificity was 0.97 (95% CI, 0.96 to 0.97). The logistic regression results indicated that agreement was more likely among patients who were older and female and if the EHR documentation occurred closer to the date of cancer diagnosis. CONCLUSION Although documentation of smoking status for patients with cancer is standard practice, we only found moderate agreement between EHR and tumor registry data. Interventions and research using EHR data should prioritize ensuring the validity of smoking status data. Multilevel strategies are needed to achieve consistent and accurate documentation of smoking status in cancer care.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 1511-1511
Author(s):  
Dylan J. Peterson ◽  
Nicolai P. Ostberg ◽  
Douglas W. Blayney ◽  
James D. Brooks ◽  
Tina Hernandez-Boussard

1511 Background: Acute care use is one of the largest drivers of cancer care costs. OP-35: Admissions and Emergency Department Visits for Patients Receiving Outpatient Chemotherapy is a CMS quality measure that will affect reimbursement based on unplanned inpatient admissions (IP) and emergency department (ED) visits. Targeted measures can reduce preventable acute care use but identifying which patients might benefit remains challenging. Prior predictive models have made use of a limited subset of the data available in the Electronic Health Record (EHR). We hypothesized dense, structured EHR data could be used to train machine learning algorithms to predict risk of preventable ED and IP visits. Methods: Patients treated at Stanford Health Care and affiliated community care sites between 2013 and 2015 who met inclusion criteria for OP-35 were selected from our EHR. Preventable ED or IP visits were identified using OP-35 criteria. Demographic, diagnosis, procedure, medication, laboratory, vital sign, and healthcare utilization data generated prior to chemotherapy treatment were obtained. A random split of 80% of the cohort was used to train a logistic regression with least absolute shrinkage and selection operator regularization (LASSO) model to predict risk for acute care events within the first 180 days of chemotherapy. The remaining 20% were used to measure model performance by the Area Under the Receiver Operator Curve (AUROC). Results: 8,439 patients were included, of whom 35% had one or more preventable event within 180 days of starting chemotherapy. Our LASSO model classified patients at risk for preventable ED or IP visits with an AUROC of 0.783 (95% CI: 0.761-0.806). Model performance was better for identifying risk for IP visits than ED visits. LASSO selected 125 of 760 possible features to use when classifying patients. These included prior acute care visits, cancer stage, race, laboratory values, and a diagnosis of depression. Key features for the model are shown in the table. Conclusions: Machine learning models trained on a large number of routinely collected clinical variables can identify patients at risk for acute care events with promising accuracy. These models have the potential to improve cancer care outcomes, patient experience, and costs by allowing for targeted preventative interventions. Future work will include prospective and external validation in other healthcare systems.[Table: see text]


Author(s):  
Emily Kogan ◽  
Kathryn Twyman ◽  
Jesse Heap ◽  
Dejan Milentijevic ◽  
Jennifer H. Lin ◽  
...  

Abstract Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.


Sign in / Sign up

Export Citation Format

Share Document