Improving data analysis in herpetology: using Akaike's Information Criterion (AIC) to assess the strength of biological hypotheses

2006 ◽  
Vol 27 (2) ◽  
pp. 169-180 ◽  
Author(s):  
Marc Mazerolle

AbstractIn ecology, researchers frequently use observational studies to explain a given pattern, such as the number of individuals in a habitat patch, with a large number of explanatory (i.e., independent) variables. To elucidate such relationships, ecologists have long relied on hypothesis testing to include or exclude variables in regression models, although the conclusions often depend on the approach used (e.g., forward, backward, stepwise selection). Though better tools have surfaced in the mid 1970's, they are still underutilized in certain fields, particularly in herpetology. This is the case of the Akaike information criterion (AIC) which is remarkably superior in model selection (i.e., variable selection) than hypothesis-based approaches. It is simple to compute and easy to understand, but more importantly, for a given data set, it provides a measure of the strength of evidence for each model that represents a plausible biological hypothesis relative to the entire set of models considered. Using this approach, one can then compute a weighted average of the estimate and standard error for any given variable of interest across all the models considered. This procedure, termed model-averaging or multimodel inference, yields precise and robust estimates. In this paper, I illustrate the use of the AIC in model selection and inference, as well as the interpretation of results analysed in this framework with two real herpetological data sets. The AIC and measures derived from it is should be routinely adopted by herpetologists.

2003 ◽  
Vol 23 (4) ◽  
pp. 490-498 ◽  
Author(s):  
Federico E. Turkheimer ◽  
Rainer Hinz ◽  
Vincent J. Cunningham

This article deals with the problem of model selection for the mathematical description of tracer kinetics in nuclear medicine. It stems from the consideration of some specific data sets where different models have similar performances. In these situations, it is shown that considerate averaging of a parameter's estimates over the entire model set is better than obtaining the estimates from one model only. Furthermore, it is also shown that the procedure of averaging over a small number of “good” models reduces the “generalization error,” the error introduced when the model selected over a particular data set is applied to different conditions, such as subject populations with altered physiologic parameters, modified acquisition protocols, and different signal-to-noise ratios. The method of averaging over the entire model set uses Akaike coefficients as measures of an individual model's likelihood. To facilitate the understanding of these statistical tools, the authors provide an introduction to model selection criteria and a short technical treatment of Akaike's information–theoretic approach. The new method is illustrated and epitomized by a case example on the modeling of [11C]flumazenil kinetics in the brain, containing both real and simulated data.


2014 ◽  
Vol 2014 ◽  
pp. 1-13
Author(s):  
Qichang Xie ◽  
Meng Du

The essential task of risk investment is to select an optimal tracking portfolio among various portfolios. Statistically, this process can be achieved by choosing an optimal restricted linear model. This paper develops a statistical procedure to do this, based on selecting appropriate weights for averaging approximately restricted models. The method of weighted average least squares is adopted to estimate the approximately restricted models under dependent error setting. The optimal weights are selected by minimizing ak-class generalized information criterion (k-GIC), which is an estimate of the average squared error from the model average fit. This model selection procedure is shown to be asymptotically optimal in the sense of obtaining the lowest possible average squared error. Monte Carlo simulations illustrate that the suggested method has comparable efficiency to some alternative model selection techniques.


2018 ◽  
Author(s):  
Yen Ting Lin ◽  
Nicolas E. Buchler

Understanding how stochastic gene expression is regulated in biological systems using snapshots of single-cell transcripts requires state-of-the-art methods of computational analysis and statistical inference. A Bayesian approach to statistical inference is the most complete method for model selection and uncertainty quantification of kinetic parameters from single-cell data. This approach is impractical because current numerical algorithms are too slow to handle typical models of gene expression. To solve this problem, we first show that time-dependent mRNA distributions of discrete-state models of gene expression are dynamic Poisson mixtures, whose mixing kernels are characterized by a piece-wise deterministic Markov process. We combined this analytical result with a kinetic Monte Carlo algorithm to create a hybrid numerical method that accelerates the calculation of time-dependent mRNA distributions by 1000-fold compared to current methods. We then integrated the hybrid algorithm into an existing Monte Carlo sampler to estimate the Bayesian posterior distribution of many different, competing models in a reasonable amount of time. We validated our method of accelerated Bayesian inference on several synthetic data sets. Our results show that kinetic parameters can be reasonably constrained for modestly sampled data sets, if the model is known a priori. If the model is unknown,the Bayesian evidence can be used to rigorously quantify the likelihood of a model relative to other models from the data. We demonstrate that Bayesian evidence selects the true model and outperforms approximate metrics, e.g., Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC), often used for model selection.


2011 ◽  
Vol 35 (1) ◽  
pp. 97-104 ◽  
Author(s):  
Marcio Paulo de Oliveira ◽  
Maria Hermínia Ferreira Tavares ◽  
Miguel Angel Uribe-Opazo ◽  
Luis Carlos Timm

Statistical models allow the representation of data sets and the estimation and/or prediction of the behavior of a given variable through its interaction with the other variables involved in a phenomenon. Among other different statistical models, are the autoregressive state-space models (ARSS) and the linear regression models (LR), which allow the quantification of the relationships among soil-plant-atmosphere system variables. To compare the quality of the ARSS and LR models for the modeling of the relationships between soybean yield and soil physical properties, Akaike's Information Criterion, which provides a coefficient for the selection of the best model, was used in this study. The data sets were sampled in a Rhodic Acrudox soil, along a spatial transect with 84 points spaced 3 m apart. At each sampling point, soybean samples were collected for yield quantification. At the same site, soil penetration resistance was also measured and soil samples were collected to measure soil bulk density in the 0-0.10 m and 0.10-0.20 m layers. Results showed autocorrelation and a cross correlation structure of soybean yield and soil penetration resistance data. Soil bulk density data, however, were only autocorrelated in the 0-0.10 m layer and not cross correlated with soybean yield. The results showed the higher efficiency of the autoregressive space-state models in relation to the equivalent simple and multiple linear regression models using Akaike's Information Criterion. The resulting values were comparatively lower than the values obtained by the regression models, for all combinations of explanatory variables.


2021 ◽  
Vol 2 (2) ◽  
pp. 40-47
Author(s):  
Sunil Kumar ◽  
Vaibhav Bhatnagar

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.


2010 ◽  
Vol 62 (4) ◽  
pp. 875-882 ◽  
Author(s):  
A. Dembélé ◽  
J.-L. Bertrand-Krajewski ◽  
B. Barillon

Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.


2006 ◽  
Vol 45 (01) ◽  
pp. 44-50 ◽  
Author(s):  
N. H. Augustin ◽  
W. Sauerbrei ◽  
N. Holländer

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.


Sign in / Sign up

Export Citation Format

Share Document