scholarly journals Prediction of cancer survival for cohorts of patients most recently diagnosed using multi-model inference

2020 ◽  
Vol 29 (12) ◽  
pp. 3605-3622
Author(s):  
Camille Maringe ◽  
Aurélien Belot ◽  
Bernard Rachet

Despite a large choice of models, functional forms and types of effects, the selection of excess hazard models for prediction of population cancer survival is not widespread in the literature. We propose multi-model inference based on excess hazard model(s) selected using Akaike information criteria or Bayesian information criteria for prediction and projection of cancer survival. We evaluate the properties of this approach using empirical data of patients diagnosed with breast, colon or lung cancer in 1990–2011. We artificially censor the data on 31 December 2010 and predict five-year survival for the 2010 and 2011 cohorts. We compare these predictions to the observed five-year cohort estimates of cancer survival and contrast them to predictions from an a priori selected simple model, and from the period approach. We illustrate the approach by replicating it for cohorts of patients for which stage at diagnosis and other important prognosis factors are available. We find that model-averaged predictions and projections of survival have close to minimal differences with the Pohar-Perme estimation of survival in many instances, particularly in subgroups of the population. Advantages of information-criterion based model selection include (i) transparent model-building strategy, (ii) accounting for model selection uncertainty, (iii) no a priori assumption for effects, and (iv) projections for patients outside of the sample.

2019 ◽  
Vol 37 (2) ◽  
pp. 549-562 ◽  
Author(s):  
Edward Susko ◽  
Andrew J Roger

Abstract The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback–Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.


Polar Biology ◽  
2021 ◽  
Vol 44 (2) ◽  
pp. 259-273
Author(s):  
Céline Cunen ◽  
Lars Walløe ◽  
Kenji Konishi ◽  
Nils Lid Hjort

AbstractChanges in the body condition of Antarctic minke whales (Balaenoptera bonaerensis) have been investigated in a number of studies, but remain contested. Here we provide a new analysis of body condition measurements, with particularly careful attention to the statistical model building and to model selection issues. We analyse body condition data for a large number (4704) of minke whales caught between 1987 and 2005. The data consist of five different variables related to body condition (fat weight, blubber thickness and girth) and a number of temporal, spatial and biological covariates. The body condition variables are analysed using linear mixed-effects models, for which we provide sound biological motivation. Further, we conduct model selection with the focused information criterion (FIC), reflecting the fact that we have a clearly specified research question, which leads us to a clear focus parameter of particular interest. We find that there has been a substantial decline in body condition over the study period (the net declines are estimated to 10% for fat weight, 7% for blubber thickness and 3% for the girth). Interestingly, there seems to be some differences in body condition trends between males and females and in different regions of the Antarctic. The decline in body condition could indicate major changes in the Antarctic ecosystem, in particular, increased competition from some larger krill-eating whale species.


Economies ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 49 ◽  
Author(s):  
Waqar Badshah ◽  
Mehmet Bulut

Only unstructured single-path model selection techniques, i.e., Information Criteria, are used by Bounds test of cointegration for model selection. The aim of this paper was twofold; one was to evaluate the performance of these five routinely used information criteria {Akaike Information Criterion (AIC), Akaike Information Criterion Corrected (AICC), Schwarz/Bayesian Information Criterion (SIC/BIC), Schwarz/Bayesian Information Criterion Corrected (SICC/BICC), and Hannan and Quinn Information Criterion (HQC)} and three structured approaches (Forward Selection, Backward Elimination, and Stepwise) by assessing their size and power properties at different sample sizes based on Monte Carlo simulations, and second was the assessment of the same based on real economic data. The second aim was achieved by the evaluation of the long-run relationship between three pairs of macroeconomic variables, i.e., Energy Consumption and GDP, Oil Price and GDP, and Broad Money and GDP for BRICS (Brazil, Russia, India, China and South Africa) countries using Bounds cointegration test. It was found that information criteria and structured procedures have the same powers for a sample size of 50 or greater. However, BICC and Stepwise are better at small sample sizes. In the light of simulation and real data results, a modified Bounds test with Stepwise model selection procedure may be used as it is strongly theoretically supported and avoids noise in the model selection process.


2021 ◽  
Vol 20 (3) ◽  
pp. 450-461
Author(s):  
Stanley L. Sclove

AbstractThe use of information criteria, especially AIC (Akaike’s information criterion) and BIC (Bayesian information criterion), for choosing an adequate number of principal components is illustrated.


2013 ◽  
Vol 677 ◽  
pp. 357-362
Author(s):  
Natthasurang Yasungnoen ◽  
Patchanok Srisuradetchai

Model selection procedures play important role in many researches especially quantitative research. . In several area of sciences, the analysis and model selection of experiments are often used and often contains two fundamental goals associated with the experimental response of interest which are to determine the best model. The way to address these goals is to implement a model selection procedure. Then, the objectives of this research are to determine whether or not the final models selected are in agreement or differ substantially across the three approaches to model selection: using Akaike’s Information Criterion, using a p-value criterion, and using a stepwise procedure.. Generally, results from these three models are usually compare to each other. All selected models are based on the heredity principle to design the possible model for each design. The actual data from literature, consisting of the 2x3 and 32 and 3x4 factorial designs are used to determine the final model. The results show that the P-Value WH and Stepwise methods give the highest percentage of matched model.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1202
Author(s):  
Luca Spolladore ◽  
Michela Gelfusa ◽  
Riccardo Rossi ◽  
Andrea Murari

Model selection criteria are widely used to identify the model that best represents the data among a set of potential candidates. Amidst the different model selection criteria, the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) are the most popular and better understood. In the derivation of these indicators, it was assumed that the model’s dependent variables have already been properly identified and that the entries are not affected by significant uncertainties. These are issues that can become quite serious when investigating complex systems, especially when variables are highly correlated and the measurement uncertainties associated with them are not negligible. More sophisticated versions of this criteria, capable of better detecting spurious relations between variables when non-negligible noise is present, are proposed in this paper. Their derivation is obtained starting from a Bayesian statistics framework and adding an a priori Chi-squared probability distribution function of the model, dependent on a specifically defined information theoretic quantity that takes into account the redundancy between the dependent variables. The performances of the proposed versions of these criteria are assessed through a series of systematic simulations, using synthetic data for various classes of functions and noise levels. The results show that the upgraded formulation of the criteria clearly outperforms the traditional ones in most of the cases reported.


2012 ◽  
Vol 52 (No. 4) ◽  
pp. 188-196 ◽  
Author(s):  
Y. Lei ◽  
S. Y Zhang

Forestmodellers have long faced the problem of selecting an appropriate mathematical model to describe tree ontogenetic or size-shape empirical relationships for tree species. A common practice is to develop many models (or a model pool) that include different functional forms, and then to select the most appropriate one for a given data set. However, this process may impose subjective restrictions on the functional form. In this process, little attention is paid to the features (e.g. asymptote and inflection point rather than asymptote and nonasymptote) of different functional forms, and to the intrinsic curve of a given data set. In order to find a better way of comparing and selecting the growth models, this paper describes and analyses the characteristics of the Schnute model. This model has both flexibility and versatility that have not been used in forestry. In this study, the Schnute model was applied to different data sets of selected forest species to determine their functional forms. The results indicate that the model shows some desirable properties for the examined data sets, and allows for discerning the different intrinsic curve shapes such as sigmoid, concave and other curve shapes. Since no suitable functional form for a given data set is usually known prior to the comparison of candidate models, it is recommended that the Schnute model be used as the first step to determine an appropriate functional form of the data set under investigation in order to avoid using a functional form a priori.


2006 ◽  
Vol 45 (01) ◽  
pp. 44-50 ◽  
Author(s):  
N. H. Augustin ◽  
W. Sauerbrei ◽  
N. Holländer

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.


2015 ◽  
Author(s):  
John J. Dziak ◽  
Donna L. Coffman ◽  
Stephanie T. Lanza ◽  
Runze Li

Choosing a model with too few parameters can involve making unrealistically simple assumptions and lead to high bias, poor prediction, and missed opportunities for insight. Such models are not flexible enough to describe the sample or the population well. A model with too many parameters can t the observed data very well, but be too closely tailored to it. Such models may generalize poorly. Penalized-likelihood information criteria, such as Akaike's Information Criterion (AIC), the Bayesian Information Criterion (BIC), the Consistent AIC, and the Adjusted BIC, are widely used for model selection. However, different criteria sometimes support different models, leading to uncertainty about which criterion is the most trustworthy. In some simple cases the comparison of two models using information criteria can be viewed as equivalent to a likelihood ratio test, with the different models representing different alpha levels (i.e., different emphases on sensitivity or specificity; Lin & Dayton 1997). This perspective may lead to insights about how to interpret the criteria in less simple situations. For example, AIC or BIC could be preferable, depending on sample size and on the relative importance one assigns to sensitivity versus specificity. Understanding the differences among the criteria may make it easier to compare their results and to use them to make informed decisions.


Sign in / Sign up

Export Citation Format

Share Document