scholarly journals Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict rapid progression of diabetic kidney disease

Author(s):  
Lili Chan ◽  
Girish N. Nadkarni ◽  
Fergus Fleming ◽  
James R. McCullough ◽  
Patti Connolly ◽  
...  

ABSTRACTImportanceDiabetic kidney disease (DKD) is the leading cause of kidney failure in the United States and predicting progression is necessary for improving outcomes.ObjectiveTo develop and validate a machine-learned, prognostic risk score (KidneyIntelX™) combining data from electronic health records (EHR) and circulating biomarkers to predict DKD progression.DesignObservational cohort studySettingTwo EHR linked biobanks: Mount Sinai BioMe Biobank and the Penn Medicine Biobank.ParticipantsPatients with prevalent DKD (G3a-G3b with all grades of albuminuria (A1-A3) and G1 & G2 with A2-A3 level albuminuria) and banked plasma.Main outcomes and measuresPlasma biomarkers soluble tumor necrosis factor 1/2 (sTNFR1, sTNFR2) and kidney injury molecule-1 (KIM-1) were measured at baseline. Patients were divided into derivation [60%] and validation sets [40%]. The composite primary end point, progressive decline in kidney function, including the following: rapid kidney function decline (RKFD) (estimated glomerular filtration rate (eGFR) decline of ≥5 ml/min/1.73m2/year), ≥40% sustained decline, or kidney failure within 5 years. A machine learning model (random forest) was trained and performance assessed using standard metrics.ResultsIn 1146 patients with DKD the median age was 63, 51% were female, median baseline eGFR was 54 ml/min/1.73 m2, urine albumin to creatinine ratio (uACR) was 61 mg/g, and follow-up was 4.3 years. 241 patients (21%) experienced progressive decline in kidney function. On 10-fold cross validation in the derivation set (n=686), the risk model had an area under the curve (AUC) of 0.77 (95% CI 0.74-0.79). In validation (n=460), the AUC was 0.77 (95% CI 0.76-0.79). By comparison, the AUC for an optimized clinical model was 0.62 (95% CI 0.61-0.63) in derivation and 0.61 (95% CI 0.60-0.63) in validation. Using cutoffs from derivation, KidneyIntelX stratified 46%, 37% and 16.5% of validation cohort into low-, intermediate- and high-risk groups, with a positive predictive value (PPV) of 62% (vs. PPV of 37% for the clinical model and 40% for KDIGO; p < 0.001) in the high-risk group and a negative predictive value (NPV) of 91% in the low-risk group. The net reclassification index for events into high-risk group was 41% (p<0.05).Conclusions and RelevanceA machine learned model combining plasma biomarkers and EHR data improved prediction of progressive decline in kidney function within 5 years over KDIGO and standard clinical models in patients with early DKD.

Diabetologia ◽  
2021 ◽  
Author(s):  
Lili Chan ◽  
Girish N. Nadkarni ◽  
Fergus Fleming ◽  
James R. McCullough ◽  
Patricia Connolly ◽  
...  

Abstract Aim Predicting progression in diabetic kidney disease (DKD) is critical to improving outcomes. We sought to develop/validate a machine-learned, prognostic risk score (KidneyIntelX™) combining electronic health records (EHR) and biomarkers. Methods This is an observational cohort study of patients with prevalent DKD/banked plasma from two EHR-linked biobanks. A random forest model was trained, and performance (AUC, positive and negative predictive values [PPV/NPV], and net reclassification index [NRI]) was compared with that of a clinical model and Kidney Disease: Improving Global Outcomes (KDIGO) categories for predicting a composite outcome of eGFR decline of ≥5 ml/min per year, ≥40% sustained decline, or kidney failure within 5 years. Results In 1146 patients, the median age was 63 years, 51% were female, the baseline eGFR was 54 ml min−1 [1.73 m]−2, the urine albumin to creatinine ratio (uACR) was 6.9 mg/mmol, follow-up was 4.3 years and 21% had the composite endpoint. On cross-validation in derivation (n = 686), KidneyIntelX had an AUC of 0.77 (95% CI 0.74, 0.79). In validation (n = 460), the AUC was 0.77 (95% CI 0.76, 0.79). By comparison, the AUC for the clinical model was 0.62 (95% CI 0.61, 0.63) in derivation and 0.61 (95% CI 0.60, 0.63) in validation. Using derivation cut-offs, KidneyIntelX stratified 46%, 37% and 17% of the validation cohort into low-, intermediate- and high-risk groups for the composite kidney endpoint, respectively. The PPV for progressive decline in kidney function in the high-risk group was 61% for KidneyIntelX vs 40% for the highest risk strata by KDIGO categorisation (p < 0.001). Only 10% of those scored as low risk by KidneyIntelX experienced progression (i.e., NPV of 90%). The NRIevent for the high-risk group was 41% (p < 0.05). Conclusions KidneyIntelX improved prediction of kidney outcomes over KDIGO and clinical models in individuals with early stages of DKD. Graphical abstract


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Yejin Mok ◽  
Shoshana Ballew ◽  
Richard Stacey ◽  
Joseph Rossi ◽  
Silvia Koton ◽  
...  

Background: The AHA/ACC 2018 Cholesterol Guideline categorizes ASCVD patients into very high-risk vs. high-risk to guide intensive therapy. This categorization is based on clinical conditions, including reduced kidney function, but does not take into account albuminuria, the other kidney measure often available in clinical practice. Methods: We studied 838 participants with major ASCVD (myocardial infarction, ischemic stroke, or symptomatic peripheral artery disease) from the ARIC study at baseline (1996 - 98). We compared urine albumin-to-creatinine ratio (ACR) and the eight high-risk conditions of age 65+, reduced kidney function, diabetes, etc. in the AHA/ACC Guideline regarding their associations with composite outcome of all-cause mortality, myocardial infarction, ischemic stroke, and heart failure. We also evaluated risk classification by adding ACR to the eight high-risk conditions. Results: During a median follow-up of 8 years, 724 (86%) participants developed a composite outcome. ACR ≥30 mg/g was associated with the composite outcome (adjusted hazard ratio [aHR] 1.45 [95% CI 1.20, 1.75]) beyond the eight high-risk conditions (aHR of these conditions ranged from 0.96 to 2.46). The addition of ACR improved the c-statistic by 0.011 (95% CI 0.003-0.019) from 0.661 to 0.672. ACR classified 4.6% of high-risk group to very high-risk and 11.2% of very high-risk group to extremely very high-risk with a reasonable calibration (Figure). Even ACR ≥10 mg/g showed a significant aHR of 1.38 (1.17, 1.63) and classified 13.4% of high-risk and 18.1% very high-risk to a higher risk category. Of our patients with ASCVD, 77% had diabetes, hypertension, or low kidney function, clinical conditions in which the ACR assessment is recommended. Conclusions: In ASCVD, albuminuria was a strong predictor of major adverse cardiovascular outcome and improved risk prediction. Clinicians should pay attention to albuminuria, in addition to eGFR, when managing ASCVD patients.


2020 ◽  
Vol 21 (Supplement_1) ◽  
Author(s):  
D M Adamczak ◽  
M Bednarski ◽  
A Rogala ◽  
M Antoniak ◽  
T Kiebalo ◽  
...  

Abstract BACKGROUND Hypertrophic cardiomyopathy (HCM) is a heart disease characterized by hypertrophy of the left ventricular myocardium. The disease is the most common cause of sudden cardiac death (SCD) in young people and competitive athletes due to fatal ventricular arrhythmias, but in most patients, however, HCM has a benign course. Therefore, it is of the utmost importance to properly evaluate patients and identify those who would benefit from a cardioverter-defibrillator (ICD) implantation. The HCM SCD-Risk Calculator is a useful tool for estimating the 5-year risk of SCD. Parameters included in the model at evaluation are: age, maximum left ventricular wall thickness, left atrial dimension, maximum gradient in left ventricular outflow tract, family history of SCD, non-sustained ventricular tachycardia and unexplained syncope. Patients’ risk of SCD is classified as low (&lt;4%), intermediate (4-&lt;6%) or high (≥6%). Those in the high-risk group should have an ICD implantation. It can also be considered in the intermediate-risk group. However, the calculator still needs improvement and machine learning (ML) has the potential to fulfill this task. ML algorithm creates a model for solving a specific problem without explicit programming - instead it relies only on available data - by discovering patterns and relations. METHODS 252 HCM patients (aged 20-88 years, 49,6% were men) treated in our Department from 2005 to 2018, have been enrolled. The follow-up lasted 0-13 years (average: 3.8 years). SCD was defined as sudden cardiac arrest (SCA) or an appropriate ICD intervention. All parameters from HCM SCD-Risk Calculator have been obtained and the risk of SCD has been calculated for all patients during the first echocardiographic evaluation. ML model with variables from HCM SCD-Risk Calculator has been created. Both methods have been compared. RESULTS 20 patients reached an SCD end-point. 1 patient died due to SCA and 19 had an appropriate ICD intervention. Among them, there were respectively 6, 7 and 7 patients in the low, intermediate and high-risk group of SCD. 1 patient, who died, had a low risk. The ML model correctly assessed the SCD event only in 1 patient. According to ML, the risk of SCD ≤2.07% was a negative predictor. CONCLUSIONS The study did not show an advantage of ML over HCM SCD-Risk Calculator. Because of the characteristic of the dataset (approximately the same number of features and observations), the selection of machine learning algorithms was limited. Best results (evaluated using LOOCV) were achieved with a decision tree. We expect that bigger dataset would allow improving model performance because of strong regularization need in the current setup.


2019 ◽  
Vol 5 (suppl) ◽  
pp. 13-13
Author(s):  
Po-Jung SU ◽  
Yu-Ann Fang ◽  
Yung-Chun Chang ◽  
Yung-Chia Kuo ◽  
Yung-Chang Lin

13 Background: For de novo metastatic prostate cancer (mPC)) patients, their prognosis may be really different. Some of these patients response very well to hormone therapy with durable survival, but others may be not. For those poor prognosis patients, if we could predict them as high risk patients when diagnosed, and provide aggressive upfront chemotherapy or novel hormonal therapy, they might get better treatment outcomes. Methods: We used data of prostate cancer patients from 2000 to 2016 in Chang Gung Research Database. There are 799 de novo mPC patients with castration. We predicted the possibility for these patients progressed to metastatic castration-resistant prostate cancer (mCRPC) in 1 year and find the high risk group patients. Then we figured out the best features for prediction from the best classifier with Recursive Feature Elimination. Results: The de nove mPC patients who pregressed to mCRPC in 1 year, whose mOS is 21.9 months is worse than who progressed to mCRPC beyond 1 year significantly, whose mOS is 80.7 months. (adjusted hazard ratio[aHR]: 6.43, P<0.001). The overall performance of machine learning by XGBoost is the best in all predictive models for high risk patients. (AUC=0.7000, Accuracy=0.7143). We excluded the features with missing data over 50%, then put all other features in the model. (AUC=0.7042, Accuracy=0.7239). But we got the best performance with only 11 features, including age, time from diagnosis to castration, nadir PSA, hemoglobin, eosinophil/white blood cell ratio, alkaline phosphatase, alanine transaminase, blood urea nitrogen, creatinine, prothrombin time, and secondary primary cancer, by Recursive Feature Elimination. (AUC=0.7131, Accuracy=0.7267). Conclusions: We found the predictive model has better predictive accuracy and shorter manuscript time with less features selected by Recursive Feature Elimination.We can predict high risk group in de novo mPC patients and make better clinical decision for treatment with this XGBoost model.


Kidney360 ◽  
2020 ◽  
Vol 1 (8) ◽  
pp. 731-739 ◽  
Author(s):  
Kinsuk Chauhan ◽  
Girish N. Nadkarni ◽  
Fergus Fleming ◽  
James McCullough ◽  
Cijiang J. He ◽  
...  

BackgroundIndividuals with type 2 diabetes (T2D) or the apolipoprotein L1 high-risk (APOL1-HR) genotypes are at increased risk of rapid kidney function decline (RKFD) and kidney failure. We hypothesized that a prognostic test using machine learning integrating blood biomarkers and longitudinal electronic health record (EHR) data would improve risk stratification.MethodsWe selected two cohorts from the Mount Sinai BioMe Biobank: T2D (n=871) and African ancestry with APOL1-HR (n=498). We measured plasma tumor necrosis factor receptors (TNFR) 1 and 2 and kidney injury molecule-1 (KIM-1) and used random forest algorithms to integrate biomarker and EHR data to generate a risk score for a composite outcome: RKFD (eGFR decline of ≥5 ml/min per year), or 40% sustained eGFR decline, or kidney failure. We compared performance to a validated clinical model and applied thresholds to assess the utility of the prognostic test (KidneyIntelX) to accurately stratify patients into risk categories.ResultsOverall, 23% of those with T2D and 18% of those with APOL1-HR experienced the composite kidney end point over a median follow-up of 4.6 and 5.9 years, respectively. The area under the receiver operator characteristic curve (AUC) of KidneyIntelX was 0.77 (95% CI, 0.75 to 0.79) in T2D, and 0.80 (95% CI, 0.77 to 0.83) in APOL1-HR, outperforming the clinical models (AUC, 0.66 [95% CI, 0.65 to 0.67] and 0.72 [95% CI, 0.71 to 0.73], respectively; P<0.001). The positive predictive values for KidneyIntelX were 62% and 62% versus 46% and 39% for the clinical models (P<0.01) in high-risk (top 15%) stratum for T2D and APOL1-HR, respectively. The negative predictive values for KidneyIntelX were 92% in T2D and 96% for APOL1-HR versus 85% and 93% for the clinical model, respectively (P=0.76 and 0.93, respectively), in low-risk stratum (bottom 50%).ConclusionsIn patients with T2D or APOL1-HR, a prognostic test (KidneyIntelX) integrating biomarker levels with longitudinal EHR data significantly improved prediction of a composite kidney end point of RKFD, 40% decline in eGFR, or kidney failure over validated clinical models.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pablo Jose Antunez Muiños ◽  
Diego López Otero ◽  
Ignacio J. Amat-Santos ◽  
Javier López País ◽  
Alvaro Aparisi ◽  
...  

AbstractDeterioration is sometimes unexpected in SARS-CoV2 infection. The aim of our study is to establish laboratory predictors of mortality in COVID-19 disease which can help to identify high risk patients. All patients admitted to hospital due to Covid-19 disease were included. Laboratory biomarkers that contributed with significant predictive value for predicting mortality to the clinical model were included. Cut-off points were established, and finally a risk score was built. 893 patients were included. Median age was 68.2 ± 15.2 years. 87(9.7%) were admitted to Intensive Care Unit (ICU) and 72(8.1%) needed mechanical ventilation support. 171(19.1%) patients died. A Covid-19 Lab score ranging from 0 to 30 points was calculated on the basis of a multivariate logistic regression model in order to predict mortality with a weighted score that included haemoglobin, erythrocytes, leukocytes, neutrophils, lymphocytes, creatinine, C-reactive protein, interleukin-6, procalcitonin, lactate dehydrogenase (LDH), and D-dimer. Three groups were established. Low mortality risk group under 12 points, 12 to 18 were included as moderate risk, and high risk group were those with 19 or more points. Low risk group as reference, moderate and high patients showed mortality OR 4.75(CI95% 2.60–8.68) and 23.86(CI 95% 13.61–41.84), respectively. C-statistic was 0–85(0.82–0.88) and Hosmer–Lemeshow p-value 0.63. Covid-19 Lab score can very easily predict mortality in patients at any moment during admission secondary to SARS-CoV2 infection. It is a simple and dynamic score, and it can be very easily replicated. It could help physicians to identify high risk patients to foresee clinical deterioration.


2020 ◽  
Author(s):  
Pablo J. Antunez Muiños ◽  
Diego López Otero ◽  
Ignacio J. Amat-Santos ◽  
Javier López Pais ◽  
Alvaro Aparisi ◽  
...  

Abstract Purpose: Deterioration is sometimes unexpected in SARS-CoV2 infection. The aim of our study is to establish laboratory predictors of mortality in COVID-19 disease which can help to identify high risk patients.Methods: All patients admitted to hospital due to Covid-19 disease were included. Laboratory biomarkers that contributed with significant predictive value for predicting mortality to the clinical model were included. Cut-off points were established, and finally a risk score was built. Results: 893 patients were included. Median age was 68.2 years(CI 95% 53.0-83.4). 87(9.7%) were admitted to Intensive Care Unit(ICU) and 72(8.1%) also needed mechanical ventilation support. 171(19.1%) patients died. A Covid-19 Lab score ranging from 0 to 30 points was calculated on the basis of a multivariate logistic regression model in order to predict mortality with a weighted score that included haemoglobin, erythrocytes, leukocytes, neutrophils, lymphocytes, creatinine, C-reactive protein, interleukin-6, procalcitonin, lactate dehydrogenase (LDH), and D-dimer. Three groups were established. Low mortality risk group under 12 points, 12 to 18 were included as moderate risk, and high risk group were those with 19 or more points. Low risk group as reference, moderate and high patients showed mortality OR 4.75(CI95% 2.60-8.68) and 23.86(CI 95% 13.61-41.84), respectively. C-statistic was 0-85(0.82-0.88) and Hosmer-Lemeshow p-value 0.63.Conclusion: Covid-19 Lab score can very easily predict mortality in patients at any moment during admission secondary to SARS-CoV2 infection. It is a simple and dynamic score, and it can be very easily replicated. It could help physicians to identify high risk patients to foresee clinical deterioration.


Sign in / Sign up

Export Citation Format

Share Document