Faculty Opinions recommendation of Assessment of survival prediction models based on microarray data.

Author(s):  
Ewout Steyerberg
2007 ◽  
Vol 23 (14) ◽  
pp. 1768-1774 ◽  
Author(s):  
M. Schumacher ◽  
H. Binder ◽  
T. Gerds

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Li-Hsin Cheng ◽  
Te-Cheng Hsu ◽  
Che Lin

AbstractBreast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.


BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhihao Lv ◽  
Yuqi Liang ◽  
Huaxi Liu ◽  
Delong Mo

Abstract Background It remains controversial whether patients with Stage II colon cancer would benefit from chemotherapy after radical surgery. This study aims to assess the real effectiveness of chemotherapy in patients with stage II colon cancer undergoing radical surgery and to construct survival prediction models to predict the survival benefits of chemotherapy. Methods Data for stage II colon cancer patients with radical surgery were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Propensity score matching (1:1) was performed according to receive or not receive chemotherapy. Competitive risk regression models were used to assess colon cancer cause-specific death (CSD) and non-colon cancer cause-specific death (NCSD). Survival prediction nomograms were constructed to predict overall survival (OS) and colon cancer cause-specific survival (CSS). The predictive abilities of the constructed models were evaluated by the concordance indexes (C-indexes) and calibration curves. Results A total of 25,110 patients were identified, 21.7% received chemotherapy, and 78.3% were without chemotherapy. A total of 10,916 patients were extracted after propensity score matching. The estimated 3-year overall survival rates of chemotherapy were 0.7% higher than non- chemotherapy. The estimated 5-year and 10-year overall survival rates of non-chemotherapy were 1.3 and 2.1% higher than chemotherapy, respectively. Survival prediction models showed good discrimination (the C-indexes between 0.582 and 0.757) and excellent calibration. Conclusions Chemotherapy improves the short-term (43 months) survival benefit of stage II colon cancer patients who received radical surgery. Survival prediction models can be used to predict OS and CSS of patients receiving chemotherapy as well as OS and CSS of patients not receiving chemotherapy and to make individualized treatment recommendations for stage II colon cancer patients who received radical surgery.


2020 ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E Braat ◽  
...  

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

In the past decade, researchers in oncology have sought to develop survival prediction models using gene expression data. The least absolute shrinkage and selection operator (lasso) has been widely used to select genes that truly correlated with a patient’s survival. The lasso selects genes for prediction by shrinking a large number of coefficients of the candidate genes towards zero based on a tuning parameter that is often determined by a cross-validation (CV). However, this method can pass over (or fail to identify) true positive genes (i.e., it identifies false negatives) in certain instances, because the lasso tends to favor the development of a simple prediction model. Here, we attempt to monitor the identification of false negatives by developing a method for estimating the number of true positive (TP) genes for a series of values of a tuning parameter that assumes a mixture distribution for the lasso estimates. Using our developed method, we performed a simulation study to examine its precision in estimating the number of TP genes. Additionally, we applied our method to a real gene expression dataset and found that it was able to identify genes correlated with survival that a CV method was unable to detect.


2019 ◽  
Vol 35 (14) ◽  
pp. i484-i491
Author(s):  
Jakob Richter ◽  
Katrin Madjar ◽  
Jörg Rahnenführer

AbstractMotivationTo obtain a reliable prediction model for a specific cancer subgroup or cohort is often difficult due to limited sample size and, in survival analysis, due to potentially high censoring rates. Sometimes similar data from other patient subgroups are available, e.g. from other clinical centers. Simple pooling of all subgroups can decrease the variance of the predicted parameters of the prediction models, but also increase the bias due to heterogeneity between the cohorts. A promising compromise is to identify those subgroups with a similar relationship between covariates and target variable and then include only these for model building.ResultsWe propose a subgroup-based weighted likelihood approach for survival prediction with high-dimensional genetic covariates. When predicting survival for a specific subgroup, for every other subgroup an individual weight determines the strength with which its observations enter into model building. MBO (model-based optimization) can be used to quickly find a good prediction model in the presence of a large number of hyperparameters. We use MBO to identify the best model for survival prediction of a specific subgroup by optimizing the weights for additional subgroups for a Cox model. The approach is evaluated on a set of lung cancer cohorts with gene expression measurements. The resulting models have competitive prediction quality, and they reflect the similarity of the corresponding cancer subgroups, with both weights close to 0 and close to 1 and medium weights.Availability and implementationmlrMBO is implemented as an R-package and is freely available at http://github.com/mlr-org/mlrMBO.


2019 ◽  
pp. 1-7 ◽  
Author(s):  
Paul Riviere ◽  
Christopher Tokeshi ◽  
Jiayi Hou ◽  
Vinit Nalawade ◽  
Reith Sarkar ◽  
...  

PURPOSE Treatment decisions about localized prostate cancer depend on accurate estimation of the patient’s life expectancy. Current cancer and noncancer survival models use a limited number of predefined variables, which could restrict their predictive capability. We explored a technique to create more comprehensive survival prediction models using insurance claims data from a large administrative data set. These data contain substantial information about medical diagnoses and procedures, and thus may provide a broader reflection of each patient’s health. METHODS We identified 57,011 Medicare beneficiaries with localized prostate cancer diagnosed between 2004 and 2009. We constructed separate cancer survival and noncancer survival prediction models using a training data set and assessed performance on a test data set. Potential model inputs included clinical and demographic covariates, and 8,971 distinct insurance claim codes describing comorbid diseases, procedures, surgeries, and diagnostic tests. We used a least absolute shrinkage and selection operator technique to identify predictive variables in the final survival models. Each model’s predictive capacity was compared with existing survival models with a metric of explained randomness (ρ2) ranging from 0 to 1, with 1 indicating an ideal prediction. RESULTS Our noncancer survival model included 143 covariates and had improved survival prediction (ρ2 = 0.60) compared with the Charlson comorbidity index (ρ2 = 0.26) and Elixhauser comorbidity index (ρ2 = 0.26). Our cancer-specific survival model included nine covariates, and had similar survival predictions (ρ2 = 0.71) to the Memorial Sloan Kettering prediction model (ρ2 = 0.68). CONCLUSION Survival prediction models using high-dimensional variable selection techniques applied to claims data show promise, particularly with noncancer survival prediction. After further validation, these analyses could inform clinical decisions for men with prostate cancer.


Sign in / Sign up

Export Citation Format

Share Document