scholarly journals Bayesian Variable Selection Utilizing Posterior Probability Credible Intervals

Author(s):  
Mengtian Du ◽  
Stacy L. Andersen ◽  
Thomas T. Perls ◽  
Paola Sebastiani

AbstractIn recent years, there has been growing interest in the problem of model selection in the Bayesian framework. Current approaches include methods based on computing model probabilities such as Stochastic Search Variable Selection (SSVS) and Bayesian LASSO and methods based on model choice criteria, such as the Deviance Information Criterion (DIC). Methods in the first group compute the posterior probabilities of models or model parameters often using a Markov Chain Monte Carlo (MCMC) technique, and select a subset of the variables based on a prespecified threshold on the posterior probability. However, these methods rely heavily on the prior choices of parameters and the results can be highly sensitive when priors are changed. DIC is a Bayesian generalization of the Akaike’s Information Criterion (AIC) that penalizes for large number of parameters, it has the advantage that can be used for selection of mixed effect models but tends to prefer overparameterized models. We propose a novel variable selection algorithm that utilizes the parameters credible intervals to select the variables to be kept in the model. We show in a simulation study and a real-world example that this algorithm on average performs better than DIC and produces more parsimonious models.

2020 ◽  
pp. 016001762095982 ◽  
Author(s):  
Zhihua Ma ◽  
Yishu Xue ◽  
Guanyu Hu

The geographically weighted regression (GWR) is a well-known statistical approach to explore spatial non-stationarity of the regression relationship in spatial data analysis. In this paper, we discuss a Bayesian recourse of GWR. Bayesian variable selection based on spike-and-slab prior, bandwidth selection based on range prior, and model assessment using a modified deviance information criterion and a modified logarithm of pseudo-marginal likelihood are fully discussed in this paper. Usage of the graph distance in modeling areal data is also introduced. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods with both small and large number of location scenarios, and comparison with the classical frequentist GWR is made. The performance of variable selection and estimation of the proposed methodology under different circumstances are satisfactory. We further apply the proposed methodology in analysis of a province-level macroeconomic data of thirty selected provinces in China. The estimation and variable selection results reveal insights about China’s economy that are convincing and agree with previous studies and facts.


2019 ◽  
Author(s):  
Sierra Bainter ◽  
Thomas Granville McCauley ◽  
Tor D Wager ◽  
Elizabeth Reynolds Losin

In this paper we address the problem of selecting important predictors from some larger set of candidate predictors. Standard techniques are limited by lack of power and high false positive rates. A Bayesian variable selection approach used widely in biostatistics, stochastic search variable selection, can be used instead to combat these issues by accounting for uncertainty in the other predictors of the model. In this paper we present Bayesian variable selection to aid researchers facing this common scenario, along with an online application (https://ssvsforpsych.shinyapps.io/ssvsforpsych/) to perform the analysis and visualize the results. Using an application to predict pain ratings, we demonstrate how this approach quickly identifies reliable predictors, even when the set of possible predictors is larger than the sample size. This technique is widely applicable to research questions that may be relatively data-rich, but with limited information or theory to guide variable selection.


2011 ◽  
Vol 93 (4) ◽  
pp. 303-318 ◽  
Author(s):  
TIMO KNÜRR ◽  
ESA LÄÄRÄ ◽  
MIKKO J. SILLANPÄÄ

SummaryA new estimation-based Bayesian variable selection approach is presented for genetic analysis of complex traits based on linear or logistic regression. By assigning a mixture of uniform priors (MU) to genetic effects, the approach provides an intuitive way of specifying hyperparameters controlling the selection of multiple influential loci. It aims at avoiding the difficulty of interpreting assumptions made in the specifications of priors. The method is compared in two real datasets with two other approaches, stochastic search variable selection (SSVS) and a re-formulation of Bayes B utilizing indicator variables and adaptive Student's t-distributions (IAt). The Markov Chain Monte Carlo (MCMC) sampling performance of the three methods is evaluated using the publicly available software OpenBUGS (model scripts are provided in the Supplementary material). The sensitivity of MU to the specification of hyperparameters is assessed in one of the data examples.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Xiaofei Wu ◽  
Shuzhen Zhu ◽  
Junjie Zhou

This paper captures the RMB exchange rate volatility using the Markov-switching GARCH (MSGARCH) models and traditional single-regime GARCH models. Through the Markov Chain Monte Carlo (MCMC) method, the model parameters are estimated to study the volatility dynamics of the RMB exchange rate. Furthermore, we compare the MSGARCH models to the single-regime GARCH specifications in terms of Value-at-Risk (VaR) prediction accuracy. According to the Deviance information criterion method, the research shows that MSGARCH models outperform the single-regime specifications in capturing the complexity of RMB exchange rate volatility. After the RMB exchange rate reform in 2015, the volatility is more asymmetric and persistent, and the probability of being in the turbulent volatility regime is significantly increased. The continuous escalation of Sino-US trade friction has increased the VaR of RMB exchange rate log-returns. From the evaluation results of the actual over expected exceedance ratio (AE), the conditional coverage (CC) test, and the dynamic quantile (DQ) test, we find strong evidence that two-regime MSGARCH models could forecast VaR more accurately, which provides practical value for China’s foreign exchange management authorities to manage the financial risk.


2020 ◽  
Vol 3 (1) ◽  
pp. 66-80 ◽  
Author(s):  
Sierra A. Bainter ◽  
Thomas G. McCauley ◽  
Tor Wager ◽  
Elizabeth A. Reynolds Losin

Frequently, researchers in psychology are faced with the challenge of narrowing down a large set of predictors to a smaller subset. There are a variety of ways to do this, but commonly it is done by choosing predictors with the strongest bivariate correlations with the outcome. However, when predictors are correlated, bivariate relationships may not translate into multivariate relationships. Further, any attempts to control for multiple testing are likely to result in extremely low power. Here we introduce a Bayesian variable-selection procedure frequently used in other disciplines, stochastic search variable selection (SSVS). We apply this technique to choosing the best set of predictors of the perceived unpleasantness of an experimental pain stimulus from among a large group of sociocultural, psychological, and neurobiological (functional MRI) individual-difference measures. Using SSVS provides information about which variables predict the outcome, controlling for uncertainty in the other variables of the model. This approach yields new, useful information to guide the choice of relevant predictors. We have provided Web-based open-source software for performing SSVS and visualizing the results.


Genetics ◽  
2003 ◽  
Vol 164 (3) ◽  
pp. 1129-1138 ◽  
Author(s):  
Nengjun Yi ◽  
Varghese George ◽  
David B Allison

AbstractIn this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density.


2021 ◽  
pp. 1-38
Author(s):  
Hongxuan Yan ◽  
Gareth W. Peters ◽  
Jennifer Chan

Abstract Mortality projection and forecasting of life expectancy are two important aspects of the study of demography and life insurance modelling. We demonstrate in this work the existence of long memory in mortality data. Furthermore, models incorporating long memory structure provide a new approach to enhance mortality forecasts in terms of accuracy and reliability, which can improve the understanding of mortality. Novel mortality models are developed by extending the Lee–Carter (LC) model for death counts to incorporate a long memory time series structure. To link our extensions to existing actuarial work, we detail the relationship between the classical models of death counts developed under a Generalised Linear Model (GLM) formulation and the extensions we propose that are developed under an extension to the GLM framework known in time series literature as the Generalised Linear Autoregressive Moving Average (GLARMA) regression models. Bayesian inference is applied to estimate the model parameters. The Deviance Information Criterion (DIC) is evaluated to select between different LC model extensions of our proposed models in terms of both in-sample fits and out-of-sample forecasts performance. Furthermore, we compare our new models against existing models structures proposed in the literature when applied to the analysis of death count data sets from 16 countries divided according to genders and age groups. Estimates of mortality rates are applied to calculate life expectancies when constructing life tables. By comparing different life expectancy estimates, results show the LC model without the long memory component may provide underestimates of life expectancy, while the long memory model structure extensions reduce this effect. In summary, it is valuable to investigate how the long memory feature in mortality influences life expectancies in the construction of life tables.


2020 ◽  
Vol 1 (1) ◽  
pp. 12-24
Author(s):  
Aiwen Xing ◽  
Lifeng Lin

Objectives Network meta-analysis is a popular tool to simultaneously compare multiple treatments and improve treatment effect estimates. However, no widely accepted guidelines are available to classify the treatment nodes in a network meta-analysis, and the node-making process was often insufficiently reported. We aim at empirically examining the impact of different treatment classifications on network meta-analysis results. Methods We collected nine published network meta-analyses with various disease outcomes; each contained some similar treatments that may be lumped. The Bayesian random-effects model was applied to these network meta-analyses before and after lumping the similar treatments. We estimated the odds ratios and their 95% credible intervals in the original and lumped network meta-analyses. We used the adjusted deviance information criterion to assess the model performance in the lumped network meta-analyses, and used the ratios of credible interval lengths and ratios of odds ratios to quantitatively evaluate the estimates’ changes due to lumping. In addition, the unrelated mean effect model was applied to examine the extents of evidence inconsistency. Results The estimated odds ratios of many treatment comparisons had noticeable changes due to lumping; many of their precisions were substantially improved. The deviance information criterion values reduced after lumping similar treatments in seven (78%) network meta-analyses, indicating better model performance. Substantial evidence inconsistency was detected in only one network meta-analysis. Conclusions Different ways of classifying treatment nodes may substantially affect network meta-analysis results. Including many insufficiently compared treatments and analysing them as separate nodes may not yield more precise estimates. Researchers should report the node-making process in detail and investigate the results’ robustness to different ways of classifying treatments.


2015 ◽  
Vol 72 (6) ◽  
pp. 879-892 ◽  
Author(s):  
Stephen G. Wischniowski ◽  
Craig R. Kastelle ◽  
Timothy Loher ◽  
Thomas E. Helser

Sagittal otoliths from juvenile Pacific halibut (Hippoglossus stenolepis) of known age were used to create a bomb-produced radiocarbon reference chronology for the eastern Bering Sea (EBS) by fitting a coupled-function model to Δ14C values from each specimen’s birth year. The newly created EBS reference chronology was then compared with a reference chronology previously created for Pacific halibut from the Gulf of Alaska (GOA). Adult Pacific halibut age-validation samples from the EBS were also analyzed for14C and modeled to validate age-estimation accuracy. A Bayesian model was developed and Markov chain Monte Carlo simulation was used to estimate model parameters and adult Pacific halibut ageing bias. Differences in reference chronologies between ocean basins were reflected in a large deviance information criterion (ΔDIC) between models, supporting the hypothesis that two separate coupled-function models were required to adequately describe the data, one each for the EBS and GOA. We determined that regionally specific GOA and EBS oceanography plays a considerable role in the Δ14C values and must be taken into consideration when selecting a reference chronology for bomb-produced14C age-validation studies. The age-validation samples indicated that the current ageing methodology used in Pacific halibut assessments is accurate and has provided accurate age assignments for Pacific halibut in the EBS.


2016 ◽  
Author(s):  
Anders Eklund ◽  
Martin A. Lindquist ◽  
Mattias Villani

AbstractWe propose a voxel-wise general linear model with autoregressive noise and heteroscedastic noise innovations (GLMH) for analyzing functional magnetic resonance imaging (fMRI) data. The model is analyzed from a Bayesian perspective and has the benefit of automatically down-weighting time points close to motion spikes in a data-driven manner. We develop a highly efficient Markov Chain Monte Carlo (MCMC) algorithm that allows for Bayesian variable selection among the regressors to model both the mean (i.e., the design matrix) and variance. This makes it possible to include a broad range of explanatory variables in both the mean and variance (e.g., time trends, activation stimuli, head motion parameters and their temporal derivatives), and to compute the posterior probability of inclusion from the MCMC output. Variable selection is also applied to the lags in the autoregressive noise process, making it possible to infer the lag order from the data simultaneously with all other model parameters. We use both simulated data and real fMRI data from OpenfMRI to illustrate the importance of proper modeling of heteroscedasticity in fMRI data analysis. Our results show that the GLMH tends to detect more brain activity, compared to its homoscedastic counterpart, by allowing the variance to change over time depending on the degree of head motion.


Sign in / Sign up

Export Citation Format

Share Document