Bootstrapping Linear Models

Mapping Intimacies ◽

10.1093/oso/9780198505044.003.0016 ◽

2017 ◽

Author(s):

Russell Cheng

Keyword(s):

Model Selection ◽

Linear Models ◽

Parametric Bootstrap ◽

Real Data ◽

Original Data ◽

Difficult Problem ◽

Full Model ◽

Number Of Factors ◽

Bootstrap Model ◽

Reliability Check

Bootstrap model selection is proposed for the difficult problem of selecting important factors in non-orthogonal linear models when the number of factors, P, is large. In the method, the full model is first fitted to the original data. Then B parametric bootstrap samples are drawn from the fitted model, and the full model fitted to each. A submodel is obtained from each fitted full model by rejecting those factors found unimportant in the fit. Each distinct selected submodel is then fitted to the original data and its Mallows Cp statistic calculated. A subset of good submodels based on the Cp values is then obtained. A reliability check can be made by fitting this subset to the BS samples also, to see how often each submodel is found to be a good fit. Use of the method is illustrated using a real-data sample.

Investigation on the Improvement of Prediction by Bootstrap Model Averaging

Methods of Information in Medicine ◽

10.1055/s-0038-1634035 ◽

2006 ◽

Vol 45 (01) ◽

pp. 44-50 ◽

Cited By ~ 8

Author(s):

N. H. Augustin ◽

W. Sauerbrei ◽

N. Holländer

Keyword(s):

Model Selection ◽

Mean Squared Error ◽

Model Averaging ◽

Predictive Performance ◽

Information Criterion ◽

Full Model ◽

Backward Elimination ◽

Study Results ◽

Model Selection Uncertainty ◽

Bootstrap Model

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.

Bootstrap model selection in generalized linear models

Journal of Agricultural Biological and Environmental Statistics ◽

10.1198/108571101300325139 ◽

2001 ◽

Vol 6 (1) ◽

pp. 49-61 ◽

Cited By ~ 7

Author(s):

Wei Pan ◽

Chap T. Le

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Linear Models ◽

Bootstrap Model

Variable Selection Using Nonlocal Priors in High-Dimensional Generalized Linear Models With Application to fMRI Data Analysis

Entropy ◽

10.3390/e22080807 ◽

2020 ◽

Vol 22 (8) ◽

pp. 807

Author(s):

Xuan Cao ◽

Kyoungjae Lee

Keyword(s):

Model Selection ◽

Linear Regression ◽

Variable Selection ◽

Generalized Linear Models ◽

Linear Models ◽

Real Data ◽

High Dimensional ◽

Model Selection Consistency ◽

Fmri Study ◽

Dimensional Variable

High-dimensional variable selection is an important research topic in modern statistics. While methods using nonlocal priors have been thoroughly studied for variable selection in linear regression, the crucial high-dimensional model selection properties for nonlocal priors in generalized linear models have not been investigated. In this paper, we consider a hierarchical generalized linear regression model with the product moment nonlocal prior over coefficients and examine its properties. Under standard regularity assumptions, we establish strong model selection consistency in a high-dimensional setting, where the number of covariates is allowed to increase at a sub-exponential rate with the sample size. The Laplace approximation is implemented for computing the posterior probabilities and the shotgun stochastic search procedure is suggested for exploring the posterior space. The proposed method is validated through simulation studies and illustrated by a real data example on functional activity analysis in fMRI study for predicting Parkinson’s disease.

Obtención de la matriz de varianzas y covarianzas a través de los productos Kronecker en modelos balanceados de dos y tres vías con aplicaciones en R

Universitas Scientiarum ◽

10.11144/javeriana.sc16-3.ootv ◽

2011 ◽

Vol 16 (3) ◽

pp. 263

Author(s):

Luz Marina Moya-Moya ◽

Milton Januario Rueda-Varón

Keyword(s):

Data Structure ◽

Covariance Matrix ◽

Mixed Models ◽

Linear Models ◽

Real Data ◽

Kronecker Products ◽

Number Of Factors ◽

Starting Point ◽

Balanced Designs ◽

Made In

Objective. To present a methodology based on the concept of Kronecker products that facilitates the construction of the variance and covariance matrix for designs with balanced data structure for 2 and 3 ways, and an application in R to facilitate its calculation and application in different areas. Materials and methods. We provide a starting point for people interested in using R in the analysis of variance. Results. We use an application made in R for a methodology based on Kronecker products through which we build the covariance matrix for working with designs with balanced data structure developed by Moya (2003). We also present an application of the method with real data. Conclusions. With this methodology we can accelerate the development and solution of some practical problems. The proposed methodology can be applied to mixed models with fixed or random effects with any number of factors. Key words: Kronecker products, variance and covariance matrix, balanced designs, linear models, R Gui.

The Comparison of Model Selection Criteria When Selecting Among Competing Hierarchical Linear Models

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1241136840 ◽

2009 ◽

Vol 8 (1) ◽

pp. 173-193 ◽

Cited By ~ 14

Author(s):

Tiffany A. Whittaker ◽

Carolyn F. Furlow

Keyword(s):

Model Selection ◽

Selection Criteria ◽

Linear Models ◽

Hierarchical Linear Models ◽

Model Selection Criteria

Goodness-of-Fit Tests for Bivariate Time Series of Counts

Econometrics ◽

10.3390/econometrics9010010 ◽

2021 ◽

Vol 9 (1) ◽

pp. 10

Author(s):

Šárka Hudecová ◽

Marie Hušková ◽

Simos G. Meintanis

Keyword(s):

Goodness Of Fit ◽

Probability Generating Function ◽

Parametric Bootstrap ◽

Real Data ◽

Data Sets ◽

Test Statistics ◽

Finite Sample ◽

Generalized Poisson ◽

Goodness Of Fit Tests ◽

Monte Carlo Experiments

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.

A New Method for Characterizing Replacement Rate Variation in Molecular Sequences: Application of the Fourier and Wavelet Models to Drosophila and Mammalian Proteins

Genetics ◽

10.1093/genetics/154.1.381 ◽

2000 ◽

Vol 154 (1) ◽

pp. 381-395

Author(s):

Pavel Morozov ◽

Tatyana Sitnikova ◽

Gary Churchill ◽

Francisco José Ayala ◽

Andrey Rzhetsky

Keyword(s):

Wavelet Transforms ◽

Parametric Bootstrap ◽

Real Data ◽

New Method ◽

Rate Variation ◽

Discrete Wavelet ◽

Data Sets ◽

Ratio Test ◽

Replacement Rate ◽

New Models

Abstract We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.

Multi-objective Full Model Selection in temporal databases: Optimizing time and performance

2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) ◽

10.1109/ropec.2016.7830617 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nancy Perez-Castro ◽

Hector Gabriel Acosta-Mesa ◽

Efren Mezura-Montes ◽

Hugo Jair Escalante

Keyword(s):

Model Selection ◽

Temporal Databases ◽

Full Model ◽

Multi Objective ◽

And Performance

Bootstrap model selection for possibly dependent and heterogeneous data

Annals of the Institute of Statistical Mathematics ◽

10.1007/s10463-008-0183-3 ◽

2008 ◽

Vol 62 (3) ◽

pp. 515-546

Author(s):

Alessio Sancetta

Keyword(s):

Model Selection ◽

Heterogeneous Data ◽

Bootstrap Model ◽

Selection For

Identification of Health Expenditures Determinants: A Model to Manage the Economic Burden of Cardiovascular Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18094652 ◽

2021 ◽

Vol 18 (9) ◽

pp. 4652

Author(s):

Fiorella Pia Salvatore ◽

Alessia Spada ◽

Francesca Fortunato ◽

Demetris Vrontis ◽

Mariantonietta Fiore

Keyword(s):

Cardiovascular Disease ◽

Hospital Discharge ◽

Random Effects ◽

Economic Burden ◽

Linear Models ◽

Health Management ◽

Model Performance ◽

Real Data ◽

Large Set ◽

Regional Health

The purpose of this paper is to investigate the determinants influencing the costs of cardiovascular disease in the regional health service in Italy’s Apulia region from 2014 to 2016. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were collected from the hospital discharge registry. Generalized linear models (GLM), and generalized linear mixed models (GLMM) were used to identify the role of random effects in improving the model performance. The study was based on socio-demographic variables and disease-specific variables (diagnosis-related group, hospitalization type, hospital stay, surgery, and economic burden of the hospital discharge form). Firstly, both models indicated an increase in health costs in 2016, and lower spending values for women (p < 0.001) were shown. GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Day-hospital has the lowest cost, surgery increases the cost, and AMI is the most expensive pathology, contrary to AF (p < 0.001). Secondly, AIC and BIC assume the lowest values for the GLMM model, indicating the random effects’ relevance in improving the model performance. This study is the first that considers real data to estimate the economic burden of CVD from the regional health service’s perspective. It appears significant for its ability to provide a large set of estimates of the economic burden of CVD, providing information to managers for health management and planning.