scholarly journals A Tutorial on Bayesian Multi-Model Linear Regression with BAS and JASP

Author(s):  
Don van den Bergh ◽  
Merlise Aycock Clyde ◽  
Akash Raj ◽  
Tim de Jong ◽  
Quentin Frederik Gronau ◽  
...  

Linear regression analyses commonly involve two consecutive stages of statistical inquiry. In the first stage, a single ‘best’ model is defined by a specific selection of relevant predictors; in the second stage, the regression coefficients of the winning model are used for prediction and for inference concerning the importance of the predictors. However, such second-stage inference ignores the model uncertainty from the first stage, resulting in overconfident parameter estimates that generalize poorly. These drawbacks can be overcome by model averaging, a technique that retains all models for inference, weighting each model’s contribution by its posterior probability. Although conceptually straightforward, model averaging is rarely used in applied research, possibly due to the lack of easily accessible software. To bridge the gap between theory and practice, we provide a tutorial on linear regression using Bayesian model averaging in JASP, based on the BAS package in R. Firstly, we provide theoretical background on linear regression, Bayesian inference, and Bayesian model averaging. Secondly, we demonstrate the method on an example data set from the World Happiness Report. Lastly, we discuss limitations of model averaging and directions for dealing with violations of model assumptions.

Author(s):  
Don van den Bergh ◽  
Merlise A. Clyde ◽  
Akash R. Komarlu Narendra Gupta ◽  
Tim de Jong ◽  
Quentin F. Gronau ◽  
...  

AbstractLinear regression analyses commonly involve two consecutive stages of statistical inquiry. In the first stage, a single ‘best’ model is defined by a specific selection of relevant predictors; in the second stage, the regression coefficients of the winning model are used for prediction and for inference concerning the importance of the predictors. However, such second-stage inference ignores the model uncertainty from the first stage, resulting in overconfident parameter estimates that generalize poorly. These drawbacks can be overcome by model averaging, a technique that retains all models for inference, weighting each model’s contribution by its posterior probability. Although conceptually straightforward, model averaging is rarely used in applied research, possibly due to the lack of easily accessible software. To bridge the gap between theory and practice, we provide a tutorial on linear regression using Bayesian model averaging in , based on the BAS package in . Firstly, we provide theoretical background on linear regression, Bayesian inference, and Bayesian model averaging. Secondly, we demonstrate the method on an example data set from the World Happiness Report. Lastly, we discuss limitations of model averaging and directions for dealing with violations of model assumptions.


2016 ◽  
Author(s):  
Joram Soch ◽  
Achim Pascal Meyer ◽  
John-Dylan Haynes ◽  
Carsten Allefeld

AbstractIn functional magnetic resonance imaging (fMRI), model quality of general linear models (GLMs) for first-level analysis is rarely assessed. In recent work (Soch et al., 2016: “How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection”, NeuroImage, vol. 141, pp. 469-489; DOI: 10.1016/j. neuroimage.2016.07.047), we have introduced cross-validated Bayesian model selection (cvBMS) to infer the best model for a group of subjects and use it to guide second-level analysis. While this is the optimal approach given that the same GLM has to be used for all subjects, there is a much more efficient procedure when model selection only addresses nuisance variables and regressors of interest are included in all candidate models. In this work, we propose cross-validated Bayesian model averaging (cvBMA) to improve parameter estimates for these regressors of interest by combining information from all models using their posterior probabilities. This is particularly useful as different models can lead to different conclusions regarding experimental effects and the most complex model is not necessarily the best choice. We find that cvBMS can prevent not detecting established effects and that cvBMA can be more sensitive to experimental effects than just using even the best model in each subject or the model which is best in a group of subjects.


2014 ◽  
Vol 10 (8) ◽  
pp. 2023-2030 ◽  
Author(s):  
Xun Huang ◽  
Zhike Zi

A new method that uses Bayesian model averaging for linear regression to infer molecular interactions in biological systems with high prediction accuracy and high computational efficiency.


Energies ◽  
2020 ◽  
Vol 13 (2) ◽  
pp. 295
Author(s):  
Matteo Spada ◽  
Peter Burgherr

The accident risk of severe (≥5 fatalities) accidents in fossil energy chains (Coal, Oil and Natural Gas) is analyzed. The full chain risk is assessed for Organization for Economic Co-operation and Development (OECD), 28 Member States of the European Union (EU28) and non-OECD countries. Furthermore, for Coal, Chinese data are analysed separately for three different periods, i.e., 1994–1999, 2000–2008 and 2009–2016, due to different data sources, and highly incomplete data prior to 1994. A Bayesian Model Averaging (BMA) is applied to investigate the risk and associated uncertainties of a comprehensive accident data set from the Paul Scherrer Institute’s ENergy-related Severe Accident Database (ENSAD). By means of BMA, frequency and severity distributions were established, and a final posterior distribution including model uncertainty is constructed by a weighted combination of the different models. The proposed approach, by dealing with lack of data and lack of knowledge, allows for a general reduction of the uncertainty in the calculated risk indicators, which is beneficial for informed decision-making strategies under uncertainty.


2020 ◽  
Vol 3 (2) ◽  
pp. 200-215
Author(s):  
Max Hinne ◽  
Quentin F. Gronau ◽  
Don van den Bergh ◽  
Eric-Jan Wagenmakers

Many statistical scenarios initially involve several candidate models that describe the data-generating process. Analysis often proceeds by first selecting the best model according to some criterion and then learning about the parameters of this selected model. Crucially, however, in this approach the parameter estimates are conditioned on the selected model, and any uncertainty about the model-selection process is ignored. An alternative is to learn the parameters for all candidate models and then combine the estimates according to the posterior probabilities of the associated models. This approach is known as Bayesian model averaging (BMA). BMA has several important advantages over all-or-none selection methods, but has been used only sparingly in the social sciences. In this conceptual introduction, we explain the principles of BMA, describe its advantages over all-or-none model selection, and showcase its utility in three examples: analysis of covariance, meta-analysis, and network analysis.


2016 ◽  
Vol 47 (1) ◽  
pp. 153-167 ◽  
Author(s):  
Shujuan Huang ◽  
Brian Hartman ◽  
Vytaras Brazauskas

Episode Treatment Groups (ETGs) classify related services into medically relevant and distinct units describing an episode of care. Proper model selection for those ETG-based costs is essential to adequately price and manage health insurance risks. The optimal claim cost model (or model probabilities) can vary depending on the disease. We compare four potential models (lognormal, gamma, log-skew-t and Lomax) using four different model selection methods (AIC and BIC weights, Random Forest feature classification and Bayesian model averaging) on 320 ETGs. Using the data from a major health insurer, which consists of more than 33 million observations from 9 million claimants, we compare the various methods on both speed and precision, and also examine the wide range of selected models for the different ETGs. Several case studies are provided for illustration. It is found that Random Forest feature selection is computationally efficient and sufficiently accurate, hence being preferred in this large data set. When feasible (on smaller data sets), Bayesian model averaging is preferred because of the posterior model probabilities.


NeuroImage ◽  
2017 ◽  
Vol 158 ◽  
pp. 186-195 ◽  
Author(s):  
Joram Soch ◽  
Achim Pascal Meyer ◽  
John-Dylan Haynes ◽  
Carsten Allefeld

Silva Fennica ◽  
2021 ◽  
Vol 55 (2) ◽  
Author(s):  
Lele Lu ◽  
Sophan Chhin ◽  
Xiongqing Zhang ◽  
Jianguo Zhang

Tree height-diameter allometry reflects the response of specific species to above and belowground resource allocation patterns. However, traditional methods (e.g. stepwise regression (SR)) may ignore model uncertainty during the variable selection process. In this study, 450 trees of Chinese fir ( (Lamb.) Hook.) grown at five spacings were used. We explored the height-diameter allometry in relation to stand and climate variables through Bayesian model averaging (BMA) and identifying the contributions of these variables to the allometry, as well as comparing with the SR method. Results showed the SR model was equal to the model with the third highest posterior probability of the BMA models. Although parameter estimates from the SR method were similar to BMA, BMA produced estimates with slightly narrower 95% intervals. Heights increased with increasing planting density, dominant height, and mean annual temperature, but decreased with increasing stand basal area and summer mean maximum temperature. The results indicated that temperature was the dominant climate variable shaping the height-diameter allometry for Chinese fir plantations. While the SR model included the mean coldest month temperature and winter mean minimum temperature, these variables were excluded in BMA, which indicated that redundant variables can be removed through BMA.Cunninghamia lanceolata


Sign in / Sign up

Export Citation Format

Share Document