survival forests
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 25)

H-INDEX

13
(FIVE YEARS 2)

Demography ◽  
2021 ◽  
Author(s):  
Bruno Arpino ◽  
Marco Le Moglie ◽  
Letizia Mencarini

Abstract This study contributes to the literature on union dissolution by adopting a machine learning (ML) approach, specifically Random Survival Forests (RSF). We used RSF to analyze data on 2,038 married or cohabiting couples who participated in the German Socio-Economic Panel Survey, and found that RSF had considerably better predictive accuracy than conventional regression models. The man's and the woman's life satisfaction and the woman's percentage of housework were the most important predictors of union dissolution; several other variables (e.g., woman's working hours, being married) also showed substantial predictive power. RSF was able to detect complex patterns of association, and some predictors examined in previous studies showed marginal or null predictive power. Finally, while we found that some personality traits were strongly predictive of union dissolution, no interactions between those traits were evident, possibly reflecting assortative mating by personality traits. From a methodological point of view, the study demonstrates the potential benefits of ML techniques for the analysis of union dissolution and for demographic research in general. Key features of ML include the ability to handle a large number of predictors, the automatic detection of nonlinearities and nonadditivities between predictors and the outcome, generally superior predictive accuracy, and robustness against multicollinearity.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kaci L Pickett ◽  
Krithika Suresh ◽  
Kristen R Campbell ◽  
Scott Davis ◽  
Elizabeth Juarez-Colunga

Abstract Background Risk prediction models for time-to-event outcomes play a vital role in personalized decision-making. A patient’s biomarker values, such as medical lab results, are often measured over time but traditional prediction models ignore their longitudinal nature, using only baseline information. Dynamic prediction incorporates longitudinal information to produce updated survival predictions during follow-up. Existing methods for dynamic prediction include joint modeling, which often suffers from computational complexity and poor performance under misspecification, and landmarking, which has a straightforward implementation but typically relies on a proportional hazards model. Random survival forests (RSF), a machine learning algorithm for time-to-event outcomes, can capture complex relationships between the predictors and survival without requiring prior specification and has been shown to have superior predictive performance. Methods We propose an alternative approach for dynamic prediction using random survival forests in a landmarking framework. With a simulation study, we compared the predictive performance of our proposed method with Cox landmarking and joint modeling in situations where the proportional hazards assumption does not hold and the longitudinal marker(s) have a complex relationship with the survival outcome. We illustrated the use of the RSF landmark approach in two clinical applications to assess the performance of various RSF model building decisions and to demonstrate its use in obtaining dynamic predictions. Results In simulation studies, RSF landmarking outperformed joint modeling and Cox landmarking when a complex relationship between the survival and longitudinal marker processes was present. It was also useful in application when there were several predictors for which the clinical relevance was unknown and multiple longitudinal biomarkers were present. Individualized dynamic predictions can be obtained from this method and the variable importance metric is useful for examining the changing predictive power of variables over time. In addition, RSF landmarking is easily implementable in standard software and using suggested specifications requires less computation time than joint modeling. Conclusions RSF landmarking is a nonparametric, machine learning alternative to current methods for obtaining dynamic predictions when there are complex or unknown relationships present. It requires little upfront decision-making and has comparable predictive performance and has preferable computational speed.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yingxin Liu ◽  
Shiyu Zhou ◽  
Hongxia Wei ◽  
Shengli An

Abstract Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Madiha Naseem ◽  
Shu Cao ◽  
Dongyun Yang ◽  
Joshua Millstein ◽  
Alberto Puccini ◽  
...  

AbstractKRAS status serves as a predictive biomarker of response to treatment in metastatic colorectal cancer (mCRC). We hypothesize that complex interactions between multiple pathways contribute to prognostic differences between KRAS wild-type and KRAS mutant patients with mCRC, and aim to identify polymorphisms predictive of clinical outcomes in this subpopulation. Most pathway association studies are limited in assessing gene–gene interactions and are restricted to an individual pathway. In this study, we use a random survival forests (RSF) method for identifying predictive markers of overall survival (OS) and progression-free survival (PFS) in mCRC patients treated with FOLFIRI/bevacizumab. A total of 486 mCRC patients treated with FOLFIRI/bevacizumab from two randomized phase III trials, TRIBE and FIRE-3, were included in the current study. Two RSF approaches were used, namely variable importance and minimal depth. We discovered that Wnt/β-catenin and tumor associated macrophage pathway SNPs are strong predictors of OS and PFS in mCRC patients treated with FOLFIRI/bevacizumab independent of KRAS status, whereas a SNP in the sex-differentiation pathway gene, DMRT1, is strongly predictive of OS and PFS in KRAS mutant mCRC patients. Our results highlight RSF as a useful method for identifying predictive SNPs in multiple pathways.


Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2442
Author(s):  
Moniek van Zutphen ◽  
Fränzel J. B. van Duijnhoven ◽  
Evertine Wesselink ◽  
Ruud W. M. Schrauwen ◽  
Ewout A. Kouwenhoven ◽  
...  

Current lifestyle recommendations for cancer survivors are the same as those for the general public to decrease their risk of cancer. However, it is unclear which lifestyle behaviors are most important for prognosis. We aimed to identify which lifestyle behaviors were most important regarding colorectal cancer (CRC) recurrence and all-cause mortality with a data-driven method. The study consisted of 1180 newly diagnosed stage I–III CRC patients from a prospective cohort study. Lifestyle behaviors included in the current recommendations, as well as additional lifestyle behaviors related to diet, physical activity, adiposity, alcohol use, and smoking were assessed six months after diagnosis. These behaviors were simultaneously analyzed as potential predictors of recurrence or all-cause mortality with Random Survival Forests (RSFs). We observed 148 recurrences during 2.6-year median follow-up and 152 deaths during 4.8-year median follow-up. Higher intakes of sugary drinks were associated with increased recurrence risk. For all-cause mortality, fruit and vegetable, liquid fat and oil, and animal protein intake were identified as the most important lifestyle behaviors. These behaviors showed non-linear associations with all-cause mortality. Our exploratory RSF findings give new ideas on potential associations between certain lifestyle behaviors and CRC prognosis that still need to be confirmed in other cohorts of CRC survivors.


2021 ◽  
Author(s):  
Yingxin Liu ◽  
Shiyu Zhou ◽  
Hongxia Wei ◽  
Shengli An

Abstract BackgroundAs a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF.MethodsIn this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of the correct variables ranking in the top by variable importance measure, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results1. Simulations show that three forests methods differ slightly in prediction performance. Real data results show that three forest methods all have advantages in different scenarios.2. For variable selection performance, 1) MSR-RF and CIF have higher selection frequency than RSF when there are multiple categorical variables in the simulation datasets, and CIF perform well especially with the interaction term.2) Forests methods seem to be suitable for processing data with correlation, as the selection frequency fluctuates slightly when correlation degree changes.3) RSF and MSR-RF outperform CIF with complete binary covariate type. MSR-RF outperform RSF and CIF with complete continuous covariate type.4) MSR-RF perform relatively robust when the variable dimension increases, while CIF perform poorly.Conclusions All three forests methods have respective advantages in different situations. It is important to choose the appropriate method based on the covariates in practice.


2021 ◽  
Author(s):  
Bharath Ambale-Venkatesh ◽  
Thiago Quinaglia ◽  
Mahsima Shabani ◽  
Jaclyn Sesso ◽  
Karan Kapoor ◽  
...  

AbstractImportanceA predictive model to automatically identify the earliest determinants of both hospital discharge and mortality in hospitalized COVID-19 patients could be of great assistance to caregivers if the predictive information is generated and made available in the immediate hours following admission.ObjectiveTo identify the most important predictors of hospital discharge and mortality from measurements at admission for hospitalized COVID-19 patients.DesignObservational cohort study.SettingElectronic records from hospitalized patients.ParticipantsPatients admitted between March 3rd and August 24th with COVID-19 in Johns Hopkins Health System hospitals.Exposures216 phenotypic variables collected within 48 hours of admission.Main OutcomesWe used age-stratified (<60 and >=60 years) random survival forests with competing risks to identify the most important predictors of death and discharge. Fine-Gray competing risk regression (FGR) models were then constructed based on the most important RSF-derived covariates.ResultsOf 2212 patients, 1913 were discharged (age 57±19, time-to-discharge 9±11 days) while 279 died (age 75±14, time to death 14±15 days). Patients >= 60 years were nearly 10 times as likely to die within 60 days of admission as those <60. As the pandemic evolved, the rate of hospital discharge increased in both older and younger patients. Incident death and hospital discharge were accurately predicted by measures of respiratory distress, inflammation, infection, renal function, red cell turn over and cardiac stress. FGR models for each of hospital discharge and mortality as outcomes based on these variables performed well in the older (AUC 0.80-0.85 at 60-days) and younger populations (AUC >0.90 at 60-days).Conclusions and RelevanceWe identified markers collected within 2 days of admission that predict hospital discharge and mortality in COVID-19 patients and provide prediction models that may be used to guide patient care. Our proposed model suggests that hospital discharge and mortality can be forecasted with high accuracy based on 8-10 variables at this stage of the COVID-19 pandemic. Our findings also point to several specific pathways that could be the focus of future investigations directed at reducing mortality and expediting hospital discharge among COVID-19 patients. Probability of hospital discharge increased over the course of the pandemic.Key PointsQuestionCan we predict the likelihood of hospital discharge as well as mortality from data obtained in the first 48 hours from admission in hospitalized COVID-19 patients?FindingsModels based on extensive phenotyping mined directly from electronic medical records followed by variable selection, accounted for the competing events of hospital death versus discharge, predicted both death and discharge with area under the receiver operating characteristic curves of >0.80.MeaningHospital discharge and mortality can be forecasted with high accuracy based on just 8-10 variables, and the probability of hospital discharge increased over the course of the pandemic.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhucheng Zhan ◽  
Zheng Jing ◽  
Bing He ◽  
Noshad Hosseini ◽  
Maria Westerhoff ◽  
...  

Abstract Pathological images are easily accessible data with the potential of prognostic biomarkers. Moreover, integration of heterogeneous data types from multi-modality, such as pathological image and gene expression data, is invaluable to help predicting cancer patient survival. However, the analytical challenges are significant. Here, we take the hepatocellular carcinoma (HCC) pathological image features extracted by CellProfiler, and apply them as the input for Cox-nnet, a neural network-based prognosis prediction model. We compare this model with the conventional Cox proportional hazards (Cox-PH) model, CoxBoost, Random Survival Forests and DeepSurv, using C-index and log-rank P-values. The results show that Cox-nnet is significantly more accurate than Cox-PH and Random Survival Forests models and comparable with CoxBoost and DeepSurv models, on pathological image features. Further, to integrate pathological image and gene expression data of the same patients, we innovatively construct a two-stage Cox-nnet model, and compare it with another complex neural-network model called PAGE-Net. The two-stage Cox-nnet complex model combining histopathology image and transcriptomic RNA-seq data achieves much better prognosis prediction, with a median C-index of 0.75 and log-rank P-value of 6e−7 in the testing datasets, compared to PAGE-Net (median C-index of 0.68 and log-rank P-value of 0.03). Imaging features present additional predictive information to gene expression features, as the combined model is more accurate than the model with gene expression alone (median C-index 0.70). Pathological image features are correlated with gene expression, as genes correlated to top imaging features present known associations with HCC patient survival and morphogenesis of liver tissue. This work proposes two-stage Cox-nnet, a new class of biologically relevant and interpretable models, to integrate multiple types of heterogenous data for survival prediction.


Sign in / Sign up

Export Citation Format

Share Document