Variable Selection Using Nonlocal Priors in High-Dimensional Generalized Linear Models With Application to fMRI Data Analysis

High-dimensional variable selection is an important research topic in modern statistics. While methods using nonlocal priors have been thoroughly studied for variable selection in linear regression, the crucial high-dimensional model selection properties for nonlocal priors in generalized linear models have not been investigated. In this paper, we consider a hierarchical generalized linear regression model with the product moment nonlocal prior over coefficients and examine its properties. Under standard regularity assumptions, we establish strong model selection consistency in a high-dimensional setting, where the number of covariates is allowed to increase at a sub-exponential rate with the sample size. The Laplace approximation is implemented for computing the posterior probabilities and the shotgun stochastic search procedure is suggested for exploring the posterior space. The proposed method is validated through simulation studies and illustrated by a real data example on functional activity analysis in fMRI study for predicting Parkinson’s disease.

Download Full-text

Large-scale model selection in misspecified generalized linear models

Biometrika ◽

10.1093/biomet/asab005 ◽

2021 ◽

Author(s):

Emre Demirkaya ◽

Yang Feng ◽

Pallavi Basu ◽

Jinchi Lv

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Large Scale ◽

Linear Models ◽

Information Criterion ◽

Scale Model ◽

High Dimensional ◽

Model Selection Consistency ◽

New Information ◽

Large Scale Model

Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the models are correctly specified or have fixed dimensionality, yet both are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv and Liu (2014) and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models.With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultra-high dimensions under some mild regularity conditions. The numerical studies demonstrate that our new method enjoys improved model selection consistency compared to its main competitors.

Download Full-text

Variable selection in high-dimensional double generalized linear models

Statistical Papers ◽

10.1007/s00362-012-0481-y ◽

2012 ◽

Vol 55 (2) ◽

pp. 327-347 ◽

Cited By ~ 11

Author(s):

Dengke Xu ◽

Zhongzhan Zhang ◽

Liucang Wu

Keyword(s):

Variable Selection ◽

Generalized Linear Models ◽

Linear Models ◽

High Dimensional

Download Full-text

Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities

The Annals of Statistics ◽

10.1214/009053607000000019 ◽

2007 ◽

Vol 35 (4) ◽

pp. 1487-1511 ◽

Cited By ~ 34

Author(s):

Wenxin Jiang

Keyword(s):

Variable Selection ◽

Generalized Linear Models ◽

Linear Models ◽

Convergence Rates ◽

Bayesian Variable Selection ◽

High Dimensional ◽

Selection For

Download Full-text

False discovery control for penalized variable selections with high-dimensional covariates

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0038 ◽

2018 ◽

Vol 17 (6) ◽

Cited By ~ 1

Author(s):

Kevin He ◽

Xiang Zhou ◽

Hui Jiang ◽

Xiaoquan Wen ◽

Yi Li

Keyword(s):

Variable Selection ◽

Linear Models ◽

Broad Class ◽

High Dimensional ◽

High Throughput Data ◽

False Discovery ◽

Dimensional Variable ◽

Selection Algorithms ◽

False Discoveries ◽

Linear Regressions

Abstract Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors much exceeding the sample size. Penalized variable selection has emerged as a powerful and efficient dimension reduction tool. However, control of false discoveries (i.e. inclusion of irrelevant variables) for penalized high-dimensional variable selection presents serious challenges. To effectively control the fraction of false discoveries for penalized variable selections, we propose a false discovery controlling procedure. The proposed method is general and flexible, and can work with a broad class of variable selection algorithms, not only for linear regressions, but also for generalized linear models and survival analysis.

Download Full-text

Variable selection for high-dimensional generalized linear models with the weighted elastic-net procedure

Journal of Applied Statistics ◽

10.1080/02664763.2015.1078300 ◽

2015 ◽

Vol 43 (5) ◽

pp. 796-809 ◽

Cited By ~ 17

Author(s):

Xiuli Wang ◽

Mingqiu Wang

Keyword(s):

Variable Selection ◽

Generalized Linear Models ◽

Linear Models ◽

Elastic Net ◽

High Dimensional ◽

Selection For

Download Full-text

Transformed low-rank ANOVA models for high-dimensional variable selection

Statistical Methods in Medical Research ◽

10.1177/0962280217753726 ◽

2018 ◽

Vol 28 (4) ◽

pp. 1230-1246 ◽

Cited By ~ 1

Author(s):

Yoonsuh Jung ◽

Hong Zhang ◽

Jianhua Hu

Keyword(s):

Variable Selection ◽

Selection Criterion ◽

Real Data ◽

Interaction Space ◽

Low Rank ◽

High Dimensional ◽

Detection Accuracy ◽

Model Framework ◽

Conventional Solution ◽

Dimensional Variable

High-dimensional data are often encountered in biomedical, environmental, and other studies. For example, in biomedical studies that involve high-throughput omic data, an important problem is to search for genetic variables that are predictive of a particular phenotype. A conventional solution is to characterize such relationships through regression models in which a phenotype is treated as the response variable and the variables are treated as covariates; this approach becomes particularly challenging when the number of variables exceeds the number of samples. We propose a general framework for expressing the transformed mean of high-dimensional variables in an exponential distribution family via ANOVA models in which a low-rank interaction space captures the association between the phenotype and the variables. This alternative method transforms the variable selection problem into a well-posed problem with the number of observations larger than the number of variables. In addition, we propose a model selection criterion for the new model framework with a diverging number of parameters, and establish the consistency of the selection criterion. We demonstrate the appealing performance of the proposed method in terms of prediction and detection accuracy through simulations and real data analyses.

Download Full-text

Law of iterated logarithm and model selection consistency for generalized linear models with independent and dependent responses

Frontiers of Mathematics in China ◽

10.1007/s11464-021-0900-2 ◽

2021 ◽

Author(s):

Xiaowei Yang ◽

Shuang Song ◽

Huiming Zhang

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Linear Models ◽

Law Of Iterated Logarithm ◽

Model Selection Consistency ◽

Iterated Logarithm ◽

Selection Consistency

Download Full-text

Robust and consistent variable selection in high-dimensional generalized linear models

Biometrika ◽

10.1093/biomet/asx070 ◽

2017 ◽

Vol 105 (1) ◽

pp. 31-44 ◽

Cited By ~ 4

Author(s):

Marco Avella-Medina ◽

Elvezio Ronchetti

Keyword(s):

Variable Selection ◽

Generalized Linear Models ◽

Linear Models ◽

High Dimensional

Download Full-text

Independently Interpretable Lasso for Generalized Linear Models

Neural Computation ◽

10.1162/neco_a_01279 ◽

2020 ◽

Vol 32 (6) ◽

pp. 1168-1221

Author(s):

Masaaki Takada ◽

Taiji Suzuki ◽

Hironori Fujisawa

Keyword(s):

Generalized Linear Models ◽

Regularization Method ◽

Linear Models ◽

Estimation Error ◽

Real Data ◽

Regression Coefficients ◽

Learning Problems ◽

High Dimensional ◽

Optimal Convergence Rate ◽

Sparse Regularization

Sparse regularization such as [Formula: see text] regularization is a quite powerful and widely used strategy for high-dimensional learning problems. The effectiveness of sparse regularization has been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary [Formula: see text] regularization selects variables correlated with each other under weak regularizations, which results in deterioration of not only its estimation error but also interpretability. In this letter, we propose a new regularization method, independently interpretable lasso (IILasso), for generalized linear models. Our proposed regularizer suppresses selecting correlated variables, so that each active variable affects the response independently in the model. Hence, we can interpret regression coefficients intuitively, and the performance is also improved by avoiding overfitting. We analyze the theoretical property of the IILasso and show that the proposed method is advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of the IILasso.

Download Full-text

Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection

Springer Proceedings in Mathematics & Statistics - Nonparametric Statistics ◽

10.1007/978-3-030-57306-5_32 ◽

2020 ◽

pp. 357-365

Author(s):

Bastien Marquis ◽

Maarten Jansen

Keyword(s):

Variable Selection ◽

High Dimensional ◽

Dimensional Variable

Download Full-text