scholarly journals Model Selection Consistency of Lasso for Empirical Data

2018 ◽  
Vol 39 (4) ◽  
pp. 607-620
Author(s):  
Yuehan Yang ◽  
Hu Yang
2015 ◽  
Vol 9 (1) ◽  
pp. 608-642 ◽  
Author(s):  
Jason D. Lee ◽  
Yuekai Sun ◽  
Jonathan E. Taylor

Biometrika ◽  
2021 ◽  
Author(s):  
Emre Demirkaya ◽  
Yang Feng ◽  
Pallavi Basu ◽  
Jinchi Lv

Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the models are correctly specified or have fixed dimensionality, yet both are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv and Liu (2014) and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models.With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultra-high dimensions under some mild regularity conditions. The numerical studies demonstrate that our new method enjoys improved model selection consistency compared to its main competitors.


2017 ◽  
Author(s):  
Craig R. Miller ◽  
James T. Van Leuven ◽  
Holly A. Wichman ◽  
Paul Joyce

AbstractFitness landscapes map genotypes to organismal fitness. Their topography depends on how mutational effects interact–epistasis–and is important for understanding evolutionary processes such as speciation, the rate of adaptation, the advantage of recombination, and predictability versus stochasticity of evolution. The growing amount of empirical data has made it possible to better test landscape models empirically. We argue that this endeavor will benefit from the development and use of meaningful null models against which to compare more complex models. Here we develop statistical and computational methods for fitting fitness data from mutation combinatorial networks to three simple models: additive, multiplicative and stickbreaking. We employ a Bayesian framework for doing model selection. Using simulations, we demonstrate that our methods work and we explore their statistical performance: bias, error, and the power to discriminate among models. We then illustrate our approach and its flexibility by analyzing several previously published datasets. An R-package that implements our methods is available in the CRAN repository under the name Stickbreaker.


Sign in / Sign up

Export Citation Format

Share Document