scholarly journals Assessing the Significance of Model Selection in Ecology

2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Edward Wheatcroft

Model Selection is a key part of many ecological studies, with Akaike’s Information Criterion (AIC) being by far the most commonly used technique for this purpose. Typically, a number of candidate models are defined a priori and ranked according to their expected out-of-sample performance. Model selection, however, only assesses the relative performance of the models and, as pointed out in a recent paper, a large proportion of ecology papers that use model selection do not assess the absolute fit of the ‘best’ model. In this paper, it is argued that assessing the absolute fit of the ‘best’ model alone does not go far enough. This is because a model that appears to perform well under model selection is also likely to appear to perform well under measures of absolute fit, even when there is no predictive value. A model selection permutation test is proposed that assesses the probability that the model selection statistic of the ‘best’ model could have occurred by chance alone, whilst taking account of dependencies between the models. It is argued that this test should always be performed as a part of formal model selection. The test is demonstrated on two real population modelling examples of ibex in northern Italy and wild reindeer in Norway.

Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1202
Author(s):  
Luca Spolladore ◽  
Michela Gelfusa ◽  
Riccardo Rossi ◽  
Andrea Murari

Model selection criteria are widely used to identify the model that best represents the data among a set of potential candidates. Amidst the different model selection criteria, the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) are the most popular and better understood. In the derivation of these indicators, it was assumed that the model’s dependent variables have already been properly identified and that the entries are not affected by significant uncertainties. These are issues that can become quite serious when investigating complex systems, especially when variables are highly correlated and the measurement uncertainties associated with them are not negligible. More sophisticated versions of this criteria, capable of better detecting spurious relations between variables when non-negligible noise is present, are proposed in this paper. Their derivation is obtained starting from a Bayesian statistics framework and adding an a priori Chi-squared probability distribution function of the model, dependent on a specifically defined information theoretic quantity that takes into account the redundancy between the dependent variables. The performances of the proposed versions of these criteria are assessed through a series of systematic simulations, using synthetic data for various classes of functions and noise levels. The results show that the upgraded formulation of the criteria clearly outperforms the traditional ones in most of the cases reported.


Water ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 2525
Author(s):  
Xiaohan Mei ◽  
Patricia K. Smith

Artificial Neural Networks (ANN) have been widely applied in hydrologic and water quality (H/WQ) modeling in the past three decades. Many studies have demonstrated an ANN’s capability to successfully estimate daily streamflow from meteorological data on the watershed level. One major challenge of ANN streamflow modeling is finding the optimal network structure with good generalization capability while ameliorating model overfitting. This study empirically examines two types of model selection approaches for simulating streamflow time series: the out-of-sample approach using blocked cross-validation (BlockedCV) and an in-sample approach that is based on Akaike’s information criterion (AIC) and Bayesian information criterion (BIC). A three-layer feed-forward neural network using a back-propagation algorithm is utilized to create the streamflow models in this study. The rainfall–streamflow relationship of two adjacent, small watersheds in the San Antonio region in south-central Texas are modeled on a daily time scale. The model selection results of the two approaches are compared, and some commonly used performance measures (PMs) are generated on the stand-alone testing datasets to evaluate the models selected by the two approaches. This study finds that, in general, the out-of-sample and in-sample approaches do not converge to the same model selection results, with AIC and BIC selecting simpler models than BlockedCV. The ANNs were found to have good performance in both study watersheds, with BlockedCV selected models having a Nash–Sutcliffe coefficient of efficiency (NSE) of 0.581 and 0.658, and AIC/BIC selected models having a poorer NSE of 0.574 and 0.310, for the two study watersheds. Overall, out-of-sample BlockedCV selected models with better predictive ability and is preferable to model streamflow time series.


2020 ◽  
Vol 29 (12) ◽  
pp. 3605-3622
Author(s):  
Camille Maringe ◽  
Aurélien Belot ◽  
Bernard Rachet

Despite a large choice of models, functional forms and types of effects, the selection of excess hazard models for prediction of population cancer survival is not widespread in the literature. We propose multi-model inference based on excess hazard model(s) selected using Akaike information criteria or Bayesian information criteria for prediction and projection of cancer survival. We evaluate the properties of this approach using empirical data of patients diagnosed with breast, colon or lung cancer in 1990–2011. We artificially censor the data on 31 December 2010 and predict five-year survival for the 2010 and 2011 cohorts. We compare these predictions to the observed five-year cohort estimates of cancer survival and contrast them to predictions from an a priori selected simple model, and from the period approach. We illustrate the approach by replicating it for cohorts of patients for which stage at diagnosis and other important prognosis factors are available. We find that model-averaged predictions and projections of survival have close to minimal differences with the Pohar-Perme estimation of survival in many instances, particularly in subgroups of the population. Advantages of information-criterion based model selection include (i) transparent model-building strategy, (ii) accounting for model selection uncertainty, (iii) no a priori assumption for effects, and (iv) projections for patients outside of the sample.


Metrika ◽  
2021 ◽  
Author(s):  
Andreas Anastasiou ◽  
Piotr Fryzlewicz

AbstractWe introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN.


Polar Biology ◽  
2021 ◽  
Vol 44 (2) ◽  
pp. 259-273
Author(s):  
Céline Cunen ◽  
Lars Walløe ◽  
Kenji Konishi ◽  
Nils Lid Hjort

AbstractChanges in the body condition of Antarctic minke whales (Balaenoptera bonaerensis) have been investigated in a number of studies, but remain contested. Here we provide a new analysis of body condition measurements, with particularly careful attention to the statistical model building and to model selection issues. We analyse body condition data for a large number (4704) of minke whales caught between 1987 and 2005. The data consist of five different variables related to body condition (fat weight, blubber thickness and girth) and a number of temporal, spatial and biological covariates. The body condition variables are analysed using linear mixed-effects models, for which we provide sound biological motivation. Further, we conduct model selection with the focused information criterion (FIC), reflecting the fact that we have a clearly specified research question, which leads us to a clear focus parameter of particular interest. We find that there has been a substantial decline in body condition over the study period (the net declines are estimated to 10% for fat weight, 7% for blubber thickness and 3% for the girth). Interestingly, there seems to be some differences in body condition trends between males and females and in different regions of the Antarctic. The decline in body condition could indicate major changes in the Antarctic ecosystem, in particular, increased competition from some larger krill-eating whale species.


2014 ◽  
Vol 2014 ◽  
pp. 1-13
Author(s):  
Qichang Xie ◽  
Meng Du

The essential task of risk investment is to select an optimal tracking portfolio among various portfolios. Statistically, this process can be achieved by choosing an optimal restricted linear model. This paper develops a statistical procedure to do this, based on selecting appropriate weights for averaging approximately restricted models. The method of weighted average least squares is adopted to estimate the approximately restricted models under dependent error setting. The optimal weights are selected by minimizing ak-class generalized information criterion (k-GIC), which is an estimate of the average squared error from the model average fit. This model selection procedure is shown to be asymptotically optimal in the sense of obtaining the lowest possible average squared error. Monte Carlo simulations illustrate that the suggested method has comparable efficiency to some alternative model selection techniques.


Economies ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 49 ◽  
Author(s):  
Waqar Badshah ◽  
Mehmet Bulut

Only unstructured single-path model selection techniques, i.e., Information Criteria, are used by Bounds test of cointegration for model selection. The aim of this paper was twofold; one was to evaluate the performance of these five routinely used information criteria {Akaike Information Criterion (AIC), Akaike Information Criterion Corrected (AICC), Schwarz/Bayesian Information Criterion (SIC/BIC), Schwarz/Bayesian Information Criterion Corrected (SICC/BICC), and Hannan and Quinn Information Criterion (HQC)} and three structured approaches (Forward Selection, Backward Elimination, and Stepwise) by assessing their size and power properties at different sample sizes based on Monte Carlo simulations, and second was the assessment of the same based on real economic data. The second aim was achieved by the evaluation of the long-run relationship between three pairs of macroeconomic variables, i.e., Energy Consumption and GDP, Oil Price and GDP, and Broad Money and GDP for BRICS (Brazil, Russia, India, China and South Africa) countries using Bounds cointegration test. It was found that information criteria and structured procedures have the same powers for a sample size of 50 or greater. However, BICC and Stepwise are better at small sample sizes. In the light of simulation and real data results, a modified Bounds test with Stepwise model selection procedure may be used as it is strongly theoretically supported and avoids noise in the model selection process.


1992 ◽  
Vol 24 (2) ◽  
pp. 11-22 ◽  
Author(s):  
Barry K. Goodwin

AbstractRecent empirical research and developments in the cattle industry suggest several reasons to suspect structural change in economic relationships determining cattle prices. Standard forecasting models may ignore structural change and may produce biased and misleading forecasts. Vector autoregressive (VAR) models that allow parameters to vary with time are used to forecast quarterly cattle prices. The VAR procedures are flexible in that they allow the identification of structural change that begins at an a priori unknown point and occurs gradually. The results indicate that the lowest RMSE for out-of-sample forecasts of cattle prices is obtained using a gradually switching VAR model. However, differences between the gradually switching VAR model and a univariate ARIMA model are not strongly significant. Impulse response functions indicate that adjustments of cattle prices to new information have become faster in recent years.


2017 ◽  
Author(s):  
Jonathan Greig

Proclus introduces the concept of the unparticipated (ἀμέθεκτον) (P1) among two other terms— the participated (P2) and participant (P3)—as the first principle (ἀρχή) of any given series of entities or Forms in his metaphysical structure. For instance, the unparticipated monad (P1), Soul, generates all individual, participated souls (P2), which in turn generate the attribute of life in their respective, participating bodies (P3). Proclus looks at (P2) as an efficient cause of (P3), where (P2) must be the attribute in actuality in relation to the attribute it brings about in (P3). At the outset, this suggests that (P2) is necessary and sufficient for (P3), which then implies a problem for positing (P1): if (P2) is doing the causal legwork for (P3), what role does (P1) play? One of Proclus’ main explanations is that (P1) is responsible for ‘unifying’ the multiple participated entities (P2), so that the commonality of the participated entities (P2) must go back to a separate source (P1). However, one could easily respond that this just amounts to a reversion to a priori Platonist principles for transcendent, separate Forms without providing a real justification for the necessity of (P1) as a cause. In my talk, I wish to elaborate on how Proclus thinks about (P1)’s type of causation in relation to (P2) and (P3), particularly showing why (P2) for Proclus is ultimately insufficient as an efficient cause compared to (P1) as the absolute first cause for a given series.[Early work on a PhD thesis chapter — presentation for the University of Edinburgh, July 16, 2017. Any comments or feedback are welcome!]


Sign in / Sign up

Export Citation Format

Share Document