Variable Selection with Shrinkage Priors via Sparse Posterior Summaries

2021 ◽  
pp. 179-198
Author(s):  
Yan Dora Zhang ◽  
Weichang Yu ◽  
Howard D. Bondell
2021 ◽  
Author(s):  
Arinjita Bhattacharyya ◽  
Subhadip Pal ◽  
Riten Mitra ◽  
Shesh Rai

Abstract Background: Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions like diabetes, colon cancer, and Alzheimer’s disease. Developing accurate prediction and classification methods have implications for personalized medicine. Building an excellent predictive model involves selecting features that are most significantly associated with the response at hand. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. The article discusses variable selection with three shrinkage priors and illustrates its application to clinical data sets such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer’s data sets. Methods: We present a unified Bayesian hierarchical framework that implements and compares shrinkage priors in binary and multinomial logistic regression models. The key feature is the representation of the likelihood by a Polya-Gamma data augmentation, which admits a natural integration with a family of shrinkage priors. We specifically focus on the Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies are conducted to assess the performances under different data dimensions and parameter settings. Measures of accuracy, AUC, brier score, L1 error, cross-entropy, ROC surface plots are used as evaluation criteria comparing the priors to frequentist methods like Lasso, Elastic-Net, and Ridge regression. Results: All three priors can be used for robust prediction with significant metrics, irrespective of their categorical response model choices. Simulation study could achieve the mean prediction accuracy of 91% (95% CI: 90.7, 91.2) and 74% (95% CI: 73.8,74.1) for logistic regression and multinomial logistic models, respectively. The model can identify significant variables for disease risk prediction and is computationally efficient. Conclusions: The models are robust enough to conduct both variable selection and future prediction because of their high shrinkage property and applicability to a broad range of classification problems.


Sankhya A ◽  
2017 ◽  
Vol 80 (2) ◽  
pp. 215-246 ◽  
Author(s):  
Xueying Tang ◽  
Xiaofan Xu ◽  
Malay Ghosh ◽  
Prasenjit Ghosh

2020 ◽  
Vol 80 (6) ◽  
pp. 1025-1058 ◽  
Author(s):  
Xinya Liang

Bayesian structural equation modeling (BSEM) is a flexible tool for the exploration and estimation of sparse factor loading structures; that is, most cross-loading entries are zero and only a few important cross-loadings are nonzero. The current investigation was focused on the BSEM with small-variance normal distribution priors (BSEM-N) for both variable selection and model estimation. The prior sensitivity in BSEM-N was explored in factor analysis models with sparse loading structures through a simulation study (Study 1) and an empirical example (Study 2). Study 1 examined the prior sensitivity in BSEM-N based on the model fit, population model recovery, true and false positive rates, and parameter estimation. Seven shrinkage priors on cross-loadings and five noninformative/vague priors on other model parameters were examined. Study 2 provided a real data example to illustrate the impact of various priors on model fit and parameter selection and estimation. Results indicated that when the 95% credible intervals of shrinkage priors barely covered the population cross-loading values, it resulted in the best balance between true and false positives. If the goal is to perform variable selection, a sparse cross-loading structure is required, preferably with a minimal number of nontrivial cross-loadings and relatively high primary loading values. To improve parameter estimates, a relatively large prior variance is preferred. When cross-loadings are relatively large, BSEM-N with zero-mean priors is not recommended for the estimation of cross-loadings and factor correlations.


2021 ◽  
Author(s):  
Kazuhiro Yamaguchi ◽  
Jihong Zhang

This study proposed efficient Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were compared to a uniform prior case in both simulation and real data analysis. The simulation study revealed that two types of horseshoe priors had a smaller root mean square errors and shorter 95% credible interval lengths than double-exponential or uniform priors. In addition, the horseshoe prior+ was slightly more stable than the horseshoe prior. The real data example successfully proved the utility of horseshoe and horseshoe+ priors in selecting effective predictive covariates for math achievement. In the final section, we discuss the benefits and limitations of the three types of Bayesian variable selection methods.


2020 ◽  
Author(s):  
Connor Donegan ◽  
Yongwan Chun ◽  
Amy E. Hughes

This paper proposes a Bayesian method for spatial regression using eigenvector spatial filtering (ESF) and Piironen and Vehtari's (2017) regularized horseshoe (RHS) prior. ESF models are most often estimated using variable selection procedures such as stepwise selection, but in the absence of a Bayesian model averaging procedure variable selection methods cannot properly account for parameter uncertainty. Hierarchical shrinkage priors such as the RHS address the foregoing concern in a computationally efficient manner by encoding prior information about spatial filters into an adaptive prior distribution which shrinks posterior estimates towards zero in the absence of a strong signal while only minimally regularizing coefficients of important eigenvectors. This paper presents results from a large simulation study which compares the performance of the proposed Bayesian model (RHS-ESF) to alternative spatial models under a variety of spatial autocorrelation scenarios. The RHS-ESF model performance matched that of the SAR model from which the data was generated. The study highlights that reliable uncertainty estimates require greater attention to spatial autocorrelation in covariates than is typically given. A demonstration analysis of 2016 U.S. Presidential election results further evidences robustness of results to hyper-prior specifications as well as the advantages of estimating spatial models using the Stan probabilistic programming language.


2017 ◽  
Vol 107 ◽  
pp. 107-119 ◽  
Author(s):  
Hanning Li ◽  
Debdeep Pati

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Zihang Lu ◽  
Wendy Lou

Abstract In many clinical studies, researchers are interested in parsimonious models that simultaneously achieve consistent variable selection and optimal prediction. The resulting parsimonious models will facilitate meaningful biological interpretation and scientific findings. Variable selection via Bayesian inference has been receiving significant advancement in recent years. Despite its increasing popularity, there is limited practical guidance for implementing these Bayesian approaches and evaluating their comparative performance in clinical datasets. In this paper, we review several commonly used Bayesian approaches to variable selection, with emphasis on application and implementation through R software. These approaches can be roughly categorized into four classes: namely the Bayesian model selection, spike-and-slab priors, shrinkage priors, and the hybrid of both. To evaluate their variable selection performance under various scenarios, we compare these four classes of approaches using real and simulated datasets. These results provide practical guidance to researchers who are interested in applying Bayesian approaches for the purpose of variable selection.


2020 ◽  
Vol 12 (1) ◽  
pp. 17-22
Author(s):  
Alexander Nadel

This paper is a system description of the anytime MaxSAT solver TT-Open-WBO-Inc, which won both of the weighted incomplete tracks of MaxSAT Evaluation 2019. We implemented the recently introduced polarity and variable selection heuristics, TORC and TSB, respectively, in the Open-WBO-Inc-BMO algorithm within the open-source anytime MaxSAT solver Open-WBO-Inc. As a result, the solver is substantially more efficient.


2019 ◽  
Vol 139 (8) ◽  
pp. 850-857
Author(s):  
Hiromu Imaji ◽  
Takuya Kinoshita ◽  
Toru Yamamoto ◽  
Keisuke Ito ◽  
Masahiro Yoshida ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document