random survival forests
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 20)

H-INDEX

12
(FIVE YEARS 2)

Demography ◽  
2021 ◽  
Author(s):  
Bruno Arpino ◽  
Marco Le Moglie ◽  
Letizia Mencarini

Abstract This study contributes to the literature on union dissolution by adopting a machine learning (ML) approach, specifically Random Survival Forests (RSF). We used RSF to analyze data on 2,038 married or cohabiting couples who participated in the German Socio-Economic Panel Survey, and found that RSF had considerably better predictive accuracy than conventional regression models. The man's and the woman's life satisfaction and the woman's percentage of housework were the most important predictors of union dissolution; several other variables (e.g., woman's working hours, being married) also showed substantial predictive power. RSF was able to detect complex patterns of association, and some predictors examined in previous studies showed marginal or null predictive power. Finally, while we found that some personality traits were strongly predictive of union dissolution, no interactions between those traits were evident, possibly reflecting assortative mating by personality traits. From a methodological point of view, the study demonstrates the potential benefits of ML techniques for the analysis of union dissolution and for demographic research in general. Key features of ML include the ability to handle a large number of predictors, the automatic detection of nonlinearities and nonadditivities between predictors and the outcome, generally superior predictive accuracy, and robustness against multicollinearity.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kaci L Pickett ◽  
Krithika Suresh ◽  
Kristen R Campbell ◽  
Scott Davis ◽  
Elizabeth Juarez-Colunga

Abstract Background Risk prediction models for time-to-event outcomes play a vital role in personalized decision-making. A patient’s biomarker values, such as medical lab results, are often measured over time but traditional prediction models ignore their longitudinal nature, using only baseline information. Dynamic prediction incorporates longitudinal information to produce updated survival predictions during follow-up. Existing methods for dynamic prediction include joint modeling, which often suffers from computational complexity and poor performance under misspecification, and landmarking, which has a straightforward implementation but typically relies on a proportional hazards model. Random survival forests (RSF), a machine learning algorithm for time-to-event outcomes, can capture complex relationships between the predictors and survival without requiring prior specification and has been shown to have superior predictive performance. Methods We propose an alternative approach for dynamic prediction using random survival forests in a landmarking framework. With a simulation study, we compared the predictive performance of our proposed method with Cox landmarking and joint modeling in situations where the proportional hazards assumption does not hold and the longitudinal marker(s) have a complex relationship with the survival outcome. We illustrated the use of the RSF landmark approach in two clinical applications to assess the performance of various RSF model building decisions and to demonstrate its use in obtaining dynamic predictions. Results In simulation studies, RSF landmarking outperformed joint modeling and Cox landmarking when a complex relationship between the survival and longitudinal marker processes was present. It was also useful in application when there were several predictors for which the clinical relevance was unknown and multiple longitudinal biomarkers were present. Individualized dynamic predictions can be obtained from this method and the variable importance metric is useful for examining the changing predictive power of variables over time. In addition, RSF landmarking is easily implementable in standard software and using suggested specifications requires less computation time than joint modeling. Conclusions RSF landmarking is a nonparametric, machine learning alternative to current methods for obtaining dynamic predictions when there are complex or unknown relationships present. It requires little upfront decision-making and has comparable predictive performance and has preferable computational speed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Madiha Naseem ◽  
Shu Cao ◽  
Dongyun Yang ◽  
Joshua Millstein ◽  
Alberto Puccini ◽  
...  

AbstractKRAS status serves as a predictive biomarker of response to treatment in metastatic colorectal cancer (mCRC). We hypothesize that complex interactions between multiple pathways contribute to prognostic differences between KRAS wild-type and KRAS mutant patients with mCRC, and aim to identify polymorphisms predictive of clinical outcomes in this subpopulation. Most pathway association studies are limited in assessing gene–gene interactions and are restricted to an individual pathway. In this study, we use a random survival forests (RSF) method for identifying predictive markers of overall survival (OS) and progression-free survival (PFS) in mCRC patients treated with FOLFIRI/bevacizumab. A total of 486 mCRC patients treated with FOLFIRI/bevacizumab from two randomized phase III trials, TRIBE and FIRE-3, were included in the current study. Two RSF approaches were used, namely variable importance and minimal depth. We discovered that Wnt/β-catenin and tumor associated macrophage pathway SNPs are strong predictors of OS and PFS in mCRC patients treated with FOLFIRI/bevacizumab independent of KRAS status, whereas a SNP in the sex-differentiation pathway gene, DMRT1, is strongly predictive of OS and PFS in KRAS mutant mCRC patients. Our results highlight RSF as a useful method for identifying predictive SNPs in multiple pathways.


Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2442
Author(s):  
Moniek van Zutphen ◽  
Fränzel J. B. van Duijnhoven ◽  
Evertine Wesselink ◽  
Ruud W. M. Schrauwen ◽  
Ewout A. Kouwenhoven ◽  
...  

Current lifestyle recommendations for cancer survivors are the same as those for the general public to decrease their risk of cancer. However, it is unclear which lifestyle behaviors are most important for prognosis. We aimed to identify which lifestyle behaviors were most important regarding colorectal cancer (CRC) recurrence and all-cause mortality with a data-driven method. The study consisted of 1180 newly diagnosed stage I–III CRC patients from a prospective cohort study. Lifestyle behaviors included in the current recommendations, as well as additional lifestyle behaviors related to diet, physical activity, adiposity, alcohol use, and smoking were assessed six months after diagnosis. These behaviors were simultaneously analyzed as potential predictors of recurrence or all-cause mortality with Random Survival Forests (RSFs). We observed 148 recurrences during 2.6-year median follow-up and 152 deaths during 4.8-year median follow-up. Higher intakes of sugary drinks were associated with increased recurrence risk. For all-cause mortality, fruit and vegetable, liquid fat and oil, and animal protein intake were identified as the most important lifestyle behaviors. These behaviors showed non-linear associations with all-cause mortality. Our exploratory RSF findings give new ideas on potential associations between certain lifestyle behaviors and CRC prognosis that still need to be confirmed in other cohorts of CRC survivors.


2021 ◽  
Vol 10 (Supplement_1) ◽  
Author(s):  
Y Chen ◽  
J Zhou ◽  
S Lee ◽  
T Liu ◽  
W Wu ◽  
...  

Abstract Funding Acknowledgements Type of funding sources: None. Objective Electronic frailty index for predicting mortality outcome of patients undergoing transaortic valvular implantation (TAVI) served as useful surrogates but is associated with a poor prognosis since it needs long time to determine the frailty status and develop the index based on electronic health records. We identify significant risk mortality predictors and tested the hypothesis that an electronic frailty index incorporating ECG measurements and laboratory examinations using a machine learning survival analysis approach can improve TAVI mortality prediction. Design A territory-wide observational study which involved a total of 450 patients (49.11% females, 22 mortalities) diagnosed undergoing TAVI and admitted to public hospitals from Hong Kong. Methods Demographics (TAVI presentation age, gender, severity of TR, AR, MR, PR, INR of TAVI presentation), prior comorbidities before TAVI presentation, ECG measurements, and CBC and LRFT laboratory examinations were analyzed. Cox regression and a supervised sequential ensemble learning algorithm: gradient boosting survival tree (GBST) model, was applied to predict mortality. Significant univariate and multivariate risk predictors of mortality were identified. Importance ranking of variables were obtained with GBST model and used to build the frailty models. Comparisons were provided with baseline models of random survival forests and multivariate Cox regression. Results The median TAVI presentation age was 82.3 years (83.8 years in mortalities, and 82.1 years in alive patients). INR of TAVI presentation in mortalities (median: 1.32) is much higher than alive ones (median: 1.07). Severe TR (hazard ratio, HR: 8.93, 95% CI: [3.22, 24.78], p value < 0.0001), INR of TAVI presentation (HR: 2.74, 95% CI: [1.84, 4.09], p value < 0.0001), cumulative hospital stays (HR: 1.01, 95% CI: [1.00, 1.01], p value = 0.0008), aspartate transaminase (HR: 1.01, 95% CI: [0.98, 1.002], p value = 0.0002), and bilirubin (HR: 1.02, 95% CI: [1.01, 1.02], p value = 0.0003) are significant mortality risk predictors. Machine learning survival analysis model found that APTT demonstrates the most important strength, followed by INR of TAVI presentation, severe TR status, cumulative hospital stays, cumulative readmission times, creatinine test, urate test ALP test, and ECG measurements of QTc and QT. GBST significantly outperformed random survival forests and multivariate Cox regression (precision: 0.91, recall: 0.89, AUC: 0.93, C-index: 0.96, and KS-index: 0.50) for mortality prediction. Conclusions  Electronic frailty index based on demographics, prior comorbidities, hospitalization characteristics, ECG measurements, and laboratory examinations can efficiently predict mortality outcome of patients undergoing TAVI. Machine learning survival learning model significantly improves the risk prediction performance and improves the construction of the frailty models for tailored interventions of TAVI patients in clinical practices.


2021 ◽  
Author(s):  
Bharath Ambale-Venkatesh ◽  
Thiago Quinaglia ◽  
Mahsima Shabani ◽  
Jaclyn Sesso ◽  
Karan Kapoor ◽  
...  

AbstractImportanceA predictive model to automatically identify the earliest determinants of both hospital discharge and mortality in hospitalized COVID-19 patients could be of great assistance to caregivers if the predictive information is generated and made available in the immediate hours following admission.ObjectiveTo identify the most important predictors of hospital discharge and mortality from measurements at admission for hospitalized COVID-19 patients.DesignObservational cohort study.SettingElectronic records from hospitalized patients.ParticipantsPatients admitted between March 3rd and August 24th with COVID-19 in Johns Hopkins Health System hospitals.Exposures216 phenotypic variables collected within 48 hours of admission.Main OutcomesWe used age-stratified (<60 and >=60 years) random survival forests with competing risks to identify the most important predictors of death and discharge. Fine-Gray competing risk regression (FGR) models were then constructed based on the most important RSF-derived covariates.ResultsOf 2212 patients, 1913 were discharged (age 57±19, time-to-discharge 9±11 days) while 279 died (age 75±14, time to death 14±15 days). Patients >= 60 years were nearly 10 times as likely to die within 60 days of admission as those <60. As the pandemic evolved, the rate of hospital discharge increased in both older and younger patients. Incident death and hospital discharge were accurately predicted by measures of respiratory distress, inflammation, infection, renal function, red cell turn over and cardiac stress. FGR models for each of hospital discharge and mortality as outcomes based on these variables performed well in the older (AUC 0.80-0.85 at 60-days) and younger populations (AUC >0.90 at 60-days).Conclusions and RelevanceWe identified markers collected within 2 days of admission that predict hospital discharge and mortality in COVID-19 patients and provide prediction models that may be used to guide patient care. Our proposed model suggests that hospital discharge and mortality can be forecasted with high accuracy based on 8-10 variables at this stage of the COVID-19 pandemic. Our findings also point to several specific pathways that could be the focus of future investigations directed at reducing mortality and expediting hospital discharge among COVID-19 patients. Probability of hospital discharge increased over the course of the pandemic.Key PointsQuestionCan we predict the likelihood of hospital discharge as well as mortality from data obtained in the first 48 hours from admission in hospitalized COVID-19 patients?FindingsModels based on extensive phenotyping mined directly from electronic medical records followed by variable selection, accounted for the competing events of hospital death versus discharge, predicted both death and discharge with area under the receiver operating characteristic curves of >0.80.MeaningHospital discharge and mortality can be forecasted with high accuracy based on just 8-10 variables, and the probability of hospital discharge increased over the course of the pandemic.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhucheng Zhan ◽  
Zheng Jing ◽  
Bing He ◽  
Noshad Hosseini ◽  
Maria Westerhoff ◽  
...  

Abstract Pathological images are easily accessible data with the potential of prognostic biomarkers. Moreover, integration of heterogeneous data types from multi-modality, such as pathological image and gene expression data, is invaluable to help predicting cancer patient survival. However, the analytical challenges are significant. Here, we take the hepatocellular carcinoma (HCC) pathological image features extracted by CellProfiler, and apply them as the input for Cox-nnet, a neural network-based prognosis prediction model. We compare this model with the conventional Cox proportional hazards (Cox-PH) model, CoxBoost, Random Survival Forests and DeepSurv, using C-index and log-rank P-values. The results show that Cox-nnet is significantly more accurate than Cox-PH and Random Survival Forests models and comparable with CoxBoost and DeepSurv models, on pathological image features. Further, to integrate pathological image and gene expression data of the same patients, we innovatively construct a two-stage Cox-nnet model, and compare it with another complex neural-network model called PAGE-Net. The two-stage Cox-nnet complex model combining histopathology image and transcriptomic RNA-seq data achieves much better prognosis prediction, with a median C-index of 0.75 and log-rank P-value of 6e−7 in the testing datasets, compared to PAGE-Net (median C-index of 0.68 and log-rank P-value of 0.03). Imaging features present additional predictive information to gene expression features, as the combined model is more accurate than the model with gene expression alone (median C-index 0.70). Pathological image features are correlated with gene expression, as genes correlated to top imaging features present known associations with HCC patient survival and morphogenesis of liver tissue. This work proposes two-stage Cox-nnet, a new class of biologically relevant and interpretable models, to integrate multiple types of heterogenous data for survival prediction.


2021 ◽  
Vol 47 (1) ◽  
pp. e4
Author(s):  
Saqib Rahman ◽  
Robert Walker ◽  
Nick Maynard ◽  
Nigel Trudgill ◽  
Tom Crosby ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document