scholarly journals SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data

2021 ◽  
Author(s):  
Yunwei Zhang ◽  
Germaine Wong ◽  
Graham Mann ◽  
Samuel Muller ◽  
Jean Yee Hwa Yang

Survival analysis is a branch of statistics that deals with both, the tracking of time and of the survival status simultaneously as the dependent response. Current comparisons of the performance of survival models mostly focus on classical clinical data with traditional statistical survival models, with prediction accuracy being often the only measurement of model performance. Moreover, survival analysis approaches for censored omics data have not been fully studied. The typical solution is to truncate survival time, to define a new status variable, and to then perform a binary classification analysis. Here, we develop a benchmarking framework that compares survival models for both clinical datasets and omics datasets, and that not only focuses on classical statistical survival models but also incorporates state-of-art ma-chine learning survival models with multiple performance evaluation measurements including model predictability, stability, flexibility and computational issues. Our comprehensive comparison framework shows that optimality is dataset and analysis method dependent. The key result is that there is no one size fits all solution for any of the criteria and any of the methods. Some methods with a high C-index suffer from computational exhaustion and instability. The implications of our framework give researchers an insight on how different survival model implementations vary over real world datasets. We highlight that care is needed when selecting methods and recommend specifically not to consider the C-index as the only performance evaluation metric as alternative metrics measure other performance aspects.

F1000Research ◽  
2017 ◽  
Vol 5 ◽  
pp. 2676 ◽  
Author(s):  
Sebastian Pölsterl ◽  
Pankaj Gupta ◽  
Lichao Wang ◽  
Sailesh Conjeti ◽  
Amin Katouzian ◽  
...  

Ensemble methods have been successfully applied in a wide range of scenarios, including survival analysis. However, most ensemble models for survival analysis consist of models that all optimize the same loss function and do not fully utilize the diversity in available models. We propose heterogeneous survival ensembles that combine several survival models, each optimizing a different loss during training. We evaluated our proposed technique in the context of the Prostate Cancer DREAM Challenge, where the objective was to predict survival of patients with metastatic, castrate-resistant prostate cancer from patient records of four phase III clinical trials. Results demonstrate that a diverse set of survival models were preferred over a single model and that our heterogeneous ensemble of survival models outperformed all competing methods with respect to predicting the exact time of death in the Prostate Cancer DREAM Challenge.


2021 ◽  
Vol 4 (3) ◽  
pp. 251524592110268
Author(s):  
Roberta Rocca ◽  
Tal Yarkoni

Consensus on standards for evaluating models and theories is an integral part of every science. Nonetheless, in psychology, relatively little focus has been placed on defining reliable communal metrics to assess model performance. Evaluation practices are often idiosyncratic and are affected by a number of shortcomings (e.g., failure to assess models’ ability to generalize to unseen data) that make it difficult to discriminate between good and bad models. Drawing inspiration from fields such as machine learning and statistical genetics, we argue in favor of introducing common benchmarks as a means of overcoming the lack of reliable model evaluation criteria currently observed in psychology. We discuss a number of principles benchmarks should satisfy to achieve maximal utility, identify concrete steps the community could take to promote the development of such benchmarks, and address a number of potential pitfalls and concerns that may arise in the course of implementation. We argue that reaching consensus on common evaluation benchmarks will foster cumulative progress in psychology and encourage researchers to place heavier emphasis on the practical utility of scientific models.


2021 ◽  
Vol 11 (19) ◽  
pp. 9243
Author(s):  
Jože Rožanec ◽  
Elena Trajkova ◽  
Klemen Kenda ◽  
Blaž Fortuna ◽  
Dunja Mladenić

While increasing empirical evidence suggests that global time series forecasting models can achieve better forecasting performance than local ones, there is a research void regarding when and why the global models fail to provide a good forecast. This paper uses anomaly detection algorithms and explainable artificial intelligence (XAI) to answer when and why a forecast should not be trusted. To address this issue, a dashboard was built to inform the user regarding (i) the relevance of the features for that particular forecast, (ii) which training samples most likely influenced the forecast outcome, (iii) why the forecast is considered an outlier, and (iv) provide a range of counterfactual examples to understand how value changes in the feature vector can lead to a different outcome. Moreover, a modular architecture and a methodology were developed to iteratively remove noisy data instances from the train set, to enhance the overall global time series forecasting model performance. Finally, to test the effectiveness of the proposed approach, it was validated on two publicly available real-world datasets.


2021 ◽  
Author(s):  
Xiaokai Yan ◽  
Chiying Xiao ◽  
Kunyan Yue ◽  
Min Chen ◽  
Hang Zhou

Abstract Background: Change in the genome plays a crucial role in cancerogenesis and many biomarkers can be used as effective prognostic indicators in diverse tumors. Currently, although many studies have constructed some predictive models for hepatocellular carcinoma (HCC) based on molecular signatures, the performance of which is unsatisfactory. To fill this shortcoming, we hope to construct a novel and accurate prognostic model with multi-omics data to guide prognostic assessments of HCC. Methods: The TCGA training set was used to identify crucial biomarkers and construct single-omic prognostic models through difference analysis, univariate Cox, and LASSO/stepwise Cox analysis. Then the performances of single-omic models were evaluated and validated through survival analysis, Harrell’s concordance index (C-index), and receiver operating characteristic (ROC) curve, in the TCGA test set and external cohorts. Besides, a comprehensive model based on multi-omics data was constructed via multiple Cox analysis, and the performance of which was evaluated in the TCGA training set and TCGA test set. Results: We identified 16 key mRNAs, 20 key lncRNAs, 5 key miRNAs, 5 key CNV genes, and 7 key SNPs which were significantly associated with the prognosis of HCC, and constructed 5 single-omic models which showed relatively good performance in prognostic prediction with c-index ranged from 0.63 to 0.75 in the TCGA training set and test set. Besides, we validated the mRNA model and the SNP model in two independent external datasets respectively, and good discriminating abilities were observed through survival analysis (P < 0.05). Moreover, the multi-omics model based on mRNA, lncRNA, miRNA, CNV, and SNP information presented a quite strong predictive ability with c-index over 0.80 and all AUC values at 1,3,5-years more than 0.84.Conclusion: In this study, we identified many biomarkers that may help study underlying carcinogenesis mechanisms in HCC, and constructed five single-omic models and an integrated multi-omics model that may provide effective and reliable guides for prognosis assessment and treatment decision-making.


PLoS ONE ◽  
2016 ◽  
Vol 11 (8) ◽  
pp. e0161135 ◽  
Author(s):  
Julio Montes-Torres ◽  
José Luis Subirats ◽  
Nuria Ribelles ◽  
Daniel Urda ◽  
Leonardo Franco ◽  
...  

2020 ◽  
Vol 163 (3) ◽  
pp. 1329-1351 ◽  
Author(s):  
Anne Gädeke ◽  
Valentina Krysanova ◽  
Aashutosh Aryal ◽  
Jinfeng Chang ◽  
Manolis Grillakis ◽  
...  

AbstractGlobal Water Models (GWMs), which include Global Hydrological, Land Surface, and Dynamic Global Vegetation Models, present valuable tools for quantifying climate change impacts on hydrological processes in the data scarce high latitudes. Here we performed a systematic model performance evaluation in six major Pan-Arctic watersheds for different hydrological indicators (monthly and seasonal discharge, extremes, trends (or lack of), and snow water equivalent (SWE)) via a novel Aggregated Performance Index (API) that is based on commonly used statistical evaluation metrics. The machine learning Boruta feature selection algorithm was used to evaluate the explanatory power of the API attributes. Our results show that the majority of the nine GWMs included in the study exhibit considerable difficulties in realistically representing Pan-Arctic hydrological processes. Average APIdischarge (monthly and seasonal discharge) over nine GWMs is > 50% only in the Kolyma basin (55%), as low as 30% in the Yukon basin and averaged over all watersheds APIdischarge is 43%. WATERGAP2 and MATSIRO present the highest (APIdischarge > 55%) while ORCHIDEE and JULES-W1 the lowest (APIdischarge ≤ 25%) performing GWMs over all watersheds. For the high and low flows, average APIextreme is 35% and 26%, respectively, and over six GWMs APISWE is 57%. The Boruta algorithm suggests that using different observation-based climate data sets does not influence the total score of the APIs in all watersheds. Ultimately, only satisfactory to good performing GWMs that effectively represent cold-region hydrological processes (including snow-related processes, permafrost) should be included in multi-model climate change impact assessments in Pan-Arctic watersheds.


Sign in / Sign up

Export Citation Format

Share Document