SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data

Survival analysis is a branch of statistics that deals with both, the tracking of time and of the survival status simultaneously as the dependent response. Current comparisons of the performance of survival models mostly focus on classical clinical data with traditional statistical survival models, with prediction accuracy being often the only measurement of model performance. Moreover, survival analysis approaches for censored omics data have not been fully studied. The typical solution is to truncate survival time, to define a new status variable, and to then perform a binary classification analysis. Here, we develop a benchmarking framework that compares survival models for both clinical datasets and omics datasets, and that not only focuses on classical statistical survival models but also incorporates state-of-art ma-chine learning survival models with multiple performance evaluation measurements including model predictability, stability, flexibility and computational issues. Our comprehensive comparison framework shows that optimality is dataset and analysis method dependent. The key result is that there is no one size fits all solution for any of the criteria and any of the methods. Some methods with a high C-index suffer from computational exhaustion and instability. The implications of our framework give researchers an insight on how different survival model implementations vary over real world datasets. We highlight that care is needed when selecting methods and recommend specifically not to consider the C-index as the only performance evaluation metric as alternative metrics measure other performance aspects.

Download Full-text

A Transport and Dispersion Model Performance Evaluation Using the Results of a Tracer Experiment in Complex Terrain

Air Pollution Modeling and Its Application X ◽

10.1007/978-1-4615-1817-4_45 ◽

1994 ◽

pp. 413-421

Author(s):

R. Lamprecht ◽

D. Berlowitz

Keyword(s):

Performance Evaluation ◽

Complex Terrain ◽

Dispersion Model ◽

Model Performance ◽

Tracer Experiment ◽

Model Performance Evaluation

Download Full-text

Heterogeneous ensembles for predicting survival of metastatic, castrate-resistant prostate cancer patients

F1000Research ◽

10.12688/f1000research.8231.2 ◽

2017 ◽

Vol 5 ◽

pp. 2676 ◽

Cited By ~ 3

Author(s):

Sebastian Pölsterl ◽

Pankaj Gupta ◽

Lichao Wang ◽

Sailesh Conjeti ◽

Amin Katouzian ◽

...

Keyword(s):

Prostate Cancer ◽

Survival Analysis ◽

Ensemble Methods ◽

Phase Iii ◽

Survival Models ◽

Phase Iii Clinical Trials ◽

Wide Range ◽

Castrate Resistant Prostate Cancer ◽

Castrate Resistant ◽

Heterogeneous Ensemble

Ensemble methods have been successfully applied in a wide range of scenarios, including survival analysis. However, most ensemble models for survival analysis consist of models that all optimize the same loss function and do not fully utilize the diversity in available models. We propose heterogeneous survival ensembles that combine several survival models, each optimizing a different loss during training. We evaluated our proposed technique in the context of the Prostate Cancer DREAM Challenge, where the objective was to predict survival of patients with metastatic, castrate-resistant prostate cancer from patient records of four phase III clinical trials. Results demonstrate that a diverse set of survival models were preferred over a single model and that our heterogeneous ensemble of survival models outperformed all competing methods with respect to predicting the exact time of death in the Prostate Cancer DREAM Challenge.

Download Full-text

Putting Psychology to the Test: Rethinking Model Evaluation Through Benchmarking and Prediction

Advances in Methods and Practices in Psychological Science ◽

10.1177/25152459211026864 ◽

2021 ◽

Vol 4 (3) ◽

pp. 251524592110268

Author(s):

Roberta Rocca ◽

Tal Yarkoni

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Model Evaluation ◽

Evaluation Criteria ◽

Model Performance ◽

Scientific Models ◽

Practical Utility ◽

Model Performance Evaluation ◽

Unseen Data ◽

Reliable Model

Consensus on standards for evaluating models and theories is an integral part of every science. Nonetheless, in psychology, relatively little focus has been placed on defining reliable communal metrics to assess model performance. Evaluation practices are often idiosyncratic and are affected by a number of shortcomings (e.g., failure to assess models’ ability to generalize to unseen data) that make it difficult to discriminate between good and bad models. Drawing inspiration from fields such as machine learning and statistical genetics, we argue in favor of introducing common benchmarks as a means of overcoming the lack of reliable model evaluation criteria currently observed in psychology. We discuss a number of principles benchmarks should satisfy to achieve maximal utility, identify concrete steps the community could take to promote the development of such benchmarks, and address a number of potential pitfalls and concerns that may arise in the course of implementation. We argue that reaching consensus on common evaluation benchmarks will foster cumulative progress in psychology and encourage researchers to place heavier emphasis on the practical utility of scientific models.

Download Full-text

Explaining Bad Forecasts in Global Time Series Models

Applied Sciences ◽

10.3390/app11199243 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9243

Author(s):

Jože Rožanec ◽

Elena Trajkova ◽

Klemen Kenda ◽

Blaž Fortuna ◽

Dunja Mladenić

Keyword(s):

Time Series ◽

Model Performance ◽

Time Series Forecasting ◽

Modular Architecture ◽

Detection Algorithms ◽

Training Samples ◽

Forecasting Performance ◽

Global Time ◽

Real World Datasets ◽

Value Changes

While increasing empirical evidence suggests that global time series forecasting models can achieve better forecasting performance than local ones, there is a research void regarding when and why the global models fail to provide a good forecast. This paper uses anomaly detection algorithms and explainable artificial intelligence (XAI) to answer when and why a forecast should not be trusted. To address this issue, a dashboard was built to inform the user regarding (i) the relevance of the features for that particular forecast, (ii) which training samples most likely influenced the forecast outcome, (iii) why the forecast is considered an outlier, and (iv) provide a range of counterfactual examples to understand how value changes in the feature vector can lead to a different outcome. Moreover, a modular architecture and a methodology were developed to iteratively remove noisy data instances from the train set, to enhance the overall global time series forecasting model performance. Finally, to test the effectiveness of the proposed approach, it was validated on two publicly available real-world datasets.

Download Full-text

Identification of Multi-omics Biomarkers and Construction of the Novel Prognostic Model for Hepatocellular Carcinoma

10.21203/rs.3.rs-452644/v1 ◽

2021 ◽

Author(s):

Xiaokai Yan ◽

Chiying Xiao ◽

Kunyan Yue ◽

Min Chen ◽

Hang Zhou

Keyword(s):

Hepatocellular Carcinoma ◽

Survival Analysis ◽

Prognostic Model ◽

Prognostic Models ◽

Prognostic Indicators ◽

Omics Data ◽

Training Set ◽

Test Set ◽

Model Based ◽

Cox Analysis

Abstract Background: Change in the genome plays a crucial role in cancerogenesis and many biomarkers can be used as effective prognostic indicators in diverse tumors. Currently, although many studies have constructed some predictive models for hepatocellular carcinoma (HCC) based on molecular signatures, the performance of which is unsatisfactory. To fill this shortcoming, we hope to construct a novel and accurate prognostic model with multi-omics data to guide prognostic assessments of HCC. Methods: The TCGA training set was used to identify crucial biomarkers and construct single-omic prognostic models through difference analysis, univariate Cox, and LASSO/stepwise Cox analysis. Then the performances of single-omic models were evaluated and validated through survival analysis, Harrell’s concordance index (C-index), and receiver operating characteristic (ROC) curve, in the TCGA test set and external cohorts. Besides, a comprehensive model based on multi-omics data was constructed via multiple Cox analysis, and the performance of which was evaluated in the TCGA training set and TCGA test set. Results: We identified 16 key mRNAs, 20 key lncRNAs, 5 key miRNAs, 5 key CNV genes, and 7 key SNPs which were significantly associated with the prognosis of HCC, and constructed 5 single-omic models which showed relatively good performance in prognostic prediction with c-index ranged from 0.63 to 0.75 in the TCGA training set and test set. Besides, we validated the mRNA model and the SNP model in two independent external datasets respectively, and good discriminating abilities were observed through survival analysis (P < 0.05). Moreover, the multi-omics model based on mRNA, lncRNA, miRNA, CNV, and SNP information presented a quite strong predictive ability with c-index over 0.80 and all AUC values at 1,3,5-years more than 0.84.Conclusion: In this study, we identified many biomarkers that may help study underlying carcinogenesis mechanisms in HCC, and constructed five single-omic models and an integrated multi-omics model that may provide effective and reliable guides for prognosis assessment and treatment decision-making.

Download Full-text

Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science

PLoS ONE ◽

10.1371/journal.pone.0161135 ◽

2016 ◽

Vol 11 (8) ◽

pp. e0161135 ◽

Cited By ~ 1

Author(s):

Julio Montes-Torres ◽

José Luis Subirats ◽

Nuria Ribelles ◽

Daniel Urda ◽

Leonardo Franco ◽

...

Keyword(s):

Survival Analysis ◽

Clinical Data ◽

Data Science ◽

Predictive Modelling ◽

Analysis Tool

Download Full-text

Survival Models: Survival Analysis in Observational Studies

Tutorials in Biostatistics ◽

10.1002/0470023678.ch1d ◽

2005 ◽

pp. 107-140

Author(s):

Kate Bull ◽

David J. Spiegelhalter

Keyword(s):

Survival Analysis ◽

Observational Studies ◽

Survival Models

Download Full-text

Indirect immunofluorescence test performance and questionnaire results from the Centers for Disease Control Model Performance Evaluation Program for human immunodeficiency virus type 1 testing.

Journal of Clinical Microbiology ◽

10.1128/jcm.28.8.1799-1807.1990 ◽

1990 ◽

Vol 28 (8) ◽

pp. 1799-1807 ◽

Cited By ~ 5

Author(s):

R N Taylor ◽

T L Hearn ◽

W O Schalla ◽

R O Valdiserri

Keyword(s):

Human Immunodeficiency Virus ◽

Performance Evaluation ◽

Test Performance ◽

Model Performance ◽

Control Model ◽

Immunofluorescence Test ◽

Model Performance Evaluation ◽

Indirect Immunofluorescence Test ◽

Immunodeficiency Virus

Download Full-text

Survival Analysis of Multi-Omics Data Identifies Potential Prognostic Markers of Pancreatic Ductal Adenocarcinoma

Frontiers in Genetics ◽

10.3389/fgene.2019.00624 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 13

Author(s):

Nitish Kumar Mishra ◽

Siddesh Southekal ◽

Chittibabu Guda

Keyword(s):

Survival Analysis ◽

Pancreatic Ductal Adenocarcinoma ◽

Prognostic Markers ◽

Ductal Adenocarcinoma ◽

Omics Data

Download Full-text

Performance evaluation of global hydrological models in six large Pan-Arctic watersheds

Climatic Change ◽

10.1007/s10584-020-02892-2 ◽

2020 ◽

Vol 163 (3) ◽

pp. 1329-1351 ◽

Cited By ~ 1

Author(s):

Anne Gädeke ◽

Valentina Krysanova ◽

Aashutosh Aryal ◽

Jinfeng Chang ◽

Manolis Grillakis ◽

...

Keyword(s):

Climate Change ◽

Performance Evaluation ◽

Land Surface ◽

Snow Water Equivalent ◽

Explanatory Power ◽

Model Performance ◽

Climate Change Impacts ◽

Hydrological Processes ◽

Climate Data ◽

Seasonal Discharge

AbstractGlobal Water Models (GWMs), which include Global Hydrological, Land Surface, and Dynamic Global Vegetation Models, present valuable tools for quantifying climate change impacts on hydrological processes in the data scarce high latitudes. Here we performed a systematic model performance evaluation in six major Pan-Arctic watersheds for different hydrological indicators (monthly and seasonal discharge, extremes, trends (or lack of), and snow water equivalent (SWE)) via a novel Aggregated Performance Index (API) that is based on commonly used statistical evaluation metrics. The machine learning Boruta feature selection algorithm was used to evaluate the explanatory power of the API attributes. Our results show that the majority of the nine GWMs included in the study exhibit considerable difficulties in realistically representing Pan-Arctic hydrological processes. Average APIdischarge (monthly and seasonal discharge) over nine GWMs is > 50% only in the Kolyma basin (55%), as low as 30% in the Yukon basin and averaged over all watersheds APIdischarge is 43%. WATERGAP2 and MATSIRO present the highest (APIdischarge > 55%) while ORCHIDEE and JULES-W1 the lowest (APIdischarge ≤ 25%) performing GWMs over all watersheds. For the high and low flows, average APIextreme is 35% and 26%, respectively, and over six GWMs APISWE is 57%. The Boruta algorithm suggests that using different observation-based climate data sets does not influence the total score of the APIs in all watersheds. Ultimately, only satisfactory to good performing GWMs that effectively represent cold-region hydrological processes (including snow-related processes, permafrost) should be included in multi-model climate change impact assessments in Pan-Arctic watersheds.

Download Full-text