Better Than Just Average: The Many Faces of Bayesian Model Weighting Methods and What They Tell Us about Multi-Model Use

Author(s):  
Marvin Höge ◽  
Anneli Guthke ◽  
Wolfgang Nowak

<p>In environmental modelling it is usually the case that multiple models are plausible, e.g. for predicting a certain quantity of interest. Using model rating methods, we typically want to elicit a single best one or the optimal average of these models. However, often, such methods are not properly applied which can lead to false conclusions.</p><p>At the examples of three different Bayesian approaches to model selection or averaging (namely 1. Bayesian Model Selection and Averaging (BMS/BMA), 2. Pseudo-BMS/BMA and 3. Bayesian Stacking), we show how very similarly looking methods pursue vastly different goals and lead to deviating results for model selection or averaging.</p><p>All three yield a weighted average of predictive distributions. Yet, only Bayesian Stacking has the goal of averaging for improved predictions in the sense of an actual (optimal) model combination. The other approaches pursue the quest of finding a single best model as the ultimate goal - yet, on different premises - and use model averaging only as a preliminary stage to prevent rash model choice.</p><p>We want to foster their proper use by, first, clarifying their theoretical background and, second, contrasting their behaviors in an applied groundwater modelling task. Third, we show how the insights gained from these Bayesian methods are transferrable to other (also non-Bayesian) model rating methods and we pose general conclusions about multi-model usage based on model weighting.</p><p> </p><p> </p>

Water ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 309
Author(s):  
Marvin Höge ◽  
Anneli Guthke ◽  
Wolfgang Nowak

Model averaging makes it possible to use multiple models for one modelling task, like predicting a certain quantity of interest. Several Bayesian approaches exist that all yield a weighted average of predictive distributions. However, often, they are not properly applied which can lead to false conclusions. In this study, we focus on Bayesian Model Selection (BMS) and Averaging (BMA), Pseudo-BMS/BMA and Bayesian Stacking. We want to foster their proper use by, first, clarifying their theoretical background and, second, contrasting their behaviours in an applied groundwater modelling task. We show that only Bayesian Stacking has the goal of model averaging for improved predictions by model combination. The other approaches pursue the quest of finding a single best model as the ultimate goal, and use model averaging only as a preliminary stage to prevent rash model choice. Improved predictions are thereby not guaranteed. In accordance with so-called M -settings that clarify the alleged relations between models and truth, we elicit which method is most promising.


2016 ◽  
Vol 30 (15) ◽  
pp. 1541002
Author(s):  
Gianpiero Gervino ◽  
Giovanni Mana ◽  
Carlo Palmisano

In this paper, we consider the problems of identifying the most appropriate model for a given physical system and of assessing the model contribution to the measurement uncertainty. The above problems are studied in terms of Bayesian model selection and model averaging. As the evaluation of the “evidence” [Formula: see text], i.e., the integral of Likelihood × Prior over the space of the measurand and the parameters, becomes impracticable when this space has [Formula: see text] dimensions, it is necessary to consider an appropriate numerical strategy. Among the many algorithms for calculating [Formula: see text], we have investigated the ellipsoidal nested sampling, which is a technique based on three pillars: The study of the iso-likelihood contour lines of the integrand, a probabilistic estimate of the volume of the parameter space contained within the iso-likelihood contours and the random samplings from hyperellipsoids embedded in the integration variables. This paper lays out the essential ideas of this approach.


2018 ◽  
Vol 10 (8) ◽  
pp. 2801 ◽  
Author(s):  
Krzysztof Drachal

Forecasting commodities prices on vividly changing markets is a hard problem to tackle. However, being able to determine important price predictors in a time-varying setting is crucial for sustainability initiatives. For example, the 2000s commodities boom gave rise to questioning whether commodities markets become over-financialized. In case of agricultural commodities, it was questioned if the speculative pressures increase food prices. Recently, some newly proposed Bayesian model combination scheme has been proposed, i.e., Dynamic Model Averaging (DMA). This method has already been applied with success in certain markets. It joins together uncertainty about the model and explanatory variables and a time-varying parameters approach. It can also capture structural breaks and respond to market disturbances. Secondly, it can deal with numerous explanatory variables in a data-rich environment. Similarly, like Bayesian Model Averaging (BMA), Dynamic Model Averaging (DMA), Dynamic Model Selection (DMS) and Median Probability Model (MED) start from Time-Varying Parameters’ (TVP) regressions. All of these methods were applied to 69 spot commodities prices. The period between Dec 1983 and Oct 2017 was analysed. In approximately 80% of cases, according to the Diebold–Mariano test, DMA produced statistically significant more accurate forecast than benchmark forecasts (like the naive method or ARIMA). Moreover, amongst all the considered model types, DMA was in 22% of cases the most accurate one (significantly). MED was most often minimising the forecast errors (28%). However, in the text, it is clarified that this was due to some specific initial parameters setting. The second ”best” model type was MED, meaning that, in the case of model selection, relying on the highest posterior probability is not always preferable.


2021 ◽  
Author(s):  
Carlos R Oliveira ◽  
Eugene D Shapiro ◽  
Daniel M Weinberger

Vaccine effectiveness (VE) studies are often conducted after the introduction of new vaccines to ensure they provide protection in real-world settings. Although susceptible to confounding, the test-negative case-control study design is the most efficient method to assess VE post-licensure. Control of confounding is often needed during the analyses, which is most efficiently done through multivariable modeling. When a large number of potential confounders are being considered, it can be challenging to know which variables need to be included in the final model. This paper highlights the importance of considering model uncertainty by re-analyzing a Lyme VE study using several confounder selection methods. We propose an intuitive Bayesian Model Averaging (BMA) framework for this task and compare the performance of BMA to that of traditional single-best-model-selection methods. We demonstrate how BMA can be advantageous in situations when there is uncertainty about model selection by systematically considering alternative models and increasing transparency.


2016 ◽  
Author(s):  
Joram Soch ◽  
Achim Pascal Meyer ◽  
John-Dylan Haynes ◽  
Carsten Allefeld

AbstractIn functional magnetic resonance imaging (fMRI), model quality of general linear models (GLMs) for first-level analysis is rarely assessed. In recent work (Soch et al., 2016: “How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection”, NeuroImage, vol. 141, pp. 469-489; DOI: 10.1016/j. neuroimage.2016.07.047), we have introduced cross-validated Bayesian model selection (cvBMS) to infer the best model for a group of subjects and use it to guide second-level analysis. While this is the optimal approach given that the same GLM has to be used for all subjects, there is a much more efficient procedure when model selection only addresses nuisance variables and regressors of interest are included in all candidate models. In this work, we propose cross-validated Bayesian model averaging (cvBMA) to improve parameter estimates for these regressors of interest by combining information from all models using their posterior probabilities. This is particularly useful as different models can lead to different conclusions regarding experimental effects and the most complex model is not necessarily the best choice. We find that cvBMS can prevent not detecting established effects and that cvBMA can be more sensitive to experimental effects than just using even the best model in each subject or the model which is best in a group of subjects.


Author(s):  
Giuseppe De Luca ◽  
Jan R. Magnus

In this article, we describe the estimation of linear regression models with uncertainty about the choice of the explanatory variables. We introduce the Stata commands bma and wals, which implement, respectively, the exact Bayesian model-averaging estimator and the weighted-average least-squares estimator developed by Magnus, Powell, and Prüfer (2010, Journal of Econometrics 154: 139–153). Unlike standard pretest estimators that are based on some preliminary diagnostic test, these model-averaging estimators provide a coherent way of making inference on the regression parameters of interest by taking into account the uncertainty due to both the estimation and the model selection steps. Special emphasis is given to several practical issues that users are likely to face in applied work: equivariance to certain transformations of the explanatory variables, stability, accuracy, computing speed, and out-of-memory problems. Performances of our bma and wals commands are illustrated using simulated data and empirical applications from the literature on model-averaging estimation.


2015 ◽  
Vol 51 (4) ◽  
pp. 2825-2846 ◽  
Author(s):  
Thomas Wöhling ◽  
Anneli Schöniger ◽  
Sebastian Gayler ◽  
Wolfgang Nowak

2010 ◽  
Vol 138 (1) ◽  
pp. 190-202 ◽  
Author(s):  
Chris Fraley ◽  
Adrian E. Raftery ◽  
Tilmann Gneiting

Abstract Bayesian model averaging (BMA) is a statistical postprocessing technique that generates calibrated and sharp predictive probability density functions (PDFs) from forecast ensembles. It represents the predictive PDF as a weighted average of PDFs centered on the bias-corrected ensemble members, where the weights reflect the relative skill of the individual members over a training period. This work adapts the BMA approach to situations that arise frequently in practice; namely, when one or more of the member forecasts are exchangeable, and when there are missing ensemble members. Exchangeable members differ in random perturbations only, such as the members of bred ensembles, singular vector ensembles, or ensemble Kalman filter systems. Accounting for exchangeability simplifies the BMA approach, in that the BMA weights and the parameters of the component PDFs can be assumed to be equal within each exchangeable group. With these adaptations, BMA can be applied to postprocess multimodel ensembles of any composition. In experiments with surface temperature and quantitative precipitation forecasts from the University of Washington mesoscale ensemble and ensemble Kalman filter systems over the Pacific Northwest, the proposed extensions yield good results. The BMA method is robust to exchangeability assumptions, and the BMA postprocessed combined ensemble shows better verification results than any of the individual, raw, or BMA postprocessed ensemble systems. These results suggest that statistically postprocessed multimodel ensembles can outperform individual ensemble systems, even in cases in which one of the constituent systems is superior to the others.


Sign in / Sign up

Export Citation Format

Share Document