scholarly journals Bayes factors for mixed-effects models

2021 ◽  
Author(s):  
Catriona Silvey ◽  
Zoltan Dienes ◽  
Elizabeth Wonnacott

In psychology, we often want to know whether or not an effect exists. The traditional way of answering this question is to use frequentist statistics. However, a significance test against a null hypothesis of no effect cannot distinguish between two states of affairs: evidence of absence of an effect, and absence of evidence for or against an effect. Bayes factors can make this distinction; however, uptake of Bayes factors in psychology has so far been low for two reasons. Firstly, they require researchers to specify the range of effect sizes their theory predicts. Researchers are often unsure about how to do this, leading to the use of inappropriate default values which may give misleading results. Secondly, many implementations of Bayes factors have a substantial technical learning curve. We present a case study and simulations demonstrating a simple method for generating a range of plausible effect sizes based on the output from frequentist mixed-effects models. Bayes factors calculated using these estimates provide intuitively reasonable results across a range of real effect sizes. The approach provides a solution to the problem of how to come up with principled estimates of effect size, and produces comparable results to a state-of-the-art method without requiring researchers to learn novel statistical software.

2017 ◽  
Author(s):  
Mirko Thalmann ◽  
Marcel Niklaus ◽  
Klaus Oberauer

Using mixed-effects models and Bayesian statistics has been advocated by statisticians in recent years. Mixed-effects models allow researchers to adequately account for the structure in the data. Bayesian statistics – in contrast to frequentist statistics – can state the evidence in favor of or against an effect of interest. For frequentist statistical methods, it is known that mixed models can lead to serious over-estimation of evidence in favor of an effect (i.e., inflated Type-I error rate) when models fail to include individual differences in the effect sizes of predictors ("random slopes") that are actually present in the data. Here, we show through simulation that the same problem exists for Bayesian mixed models. Yet, at present there is no easy-to-use application that allows for the estimation of Bayes Factors for mixed models with random slopes on continuous predictors. Here, we close this gap by introducing a new R package called BayesRS. We tested its functionality in four simulation studies. They show that BayesRS offers a reliable and valid tool to compute Bayes Factors. BayesRS also allows users to account for correlations between random effects. In a fifth simulation study we show, however, that doing so leads to slight underestimation of the evidence in favor of an actually present effect. We only recommend modeling correlations between random effects when they are of primary interest and when sample size is large enough. BayesRS is available under https://cran.r-project.org/web/packages/BayesRS/, R code for all simulations is available under https://osf.io/nse5x/?view_only=b9a7caccd26a4764a084de3b8d459388


2019 ◽  
Author(s):  
Bence Palfi ◽  
Zoltan Dienes

Psychologists are often interested whether an experimental manipulation has a different effect in condition A than in condition B. To test such a question, one needs to directly compare the conditions (i.e. test the interaction). Yet, many tend to stop when they find a significant test in one condition and a non-significant test in the other condition, and deem it as sufficient evidence for the difference between the two conditions. This tutorial aims to raise awareness of this inferential mistake when Bayes factors are used with conventional cut-offs to draw conclusions. For instance, some might falsely conclude that there must be good enough evidence for the interaction if they find good enough Bayesian evidence for H1 in condition A and good enough Bayesian evidence for H0 in condition B. The introduced case study highlights that ignoring the test of the interaction can lead to unjustified conclusions and demonstrates that the principle that any assertion about the existence of an interaction necessitates the comparison of the conditions is as true for Bayesian as it is for frequentist statistics. We provide an R script of the analyses of the case study and a Shiny App that can be used with a 2x2 design to develop intuitions on the current issue, and we introduce a rule of thumb with which one can estimate the sample size one might need to have a well-powered design.


2020 ◽  
Vol 3 (3) ◽  
pp. 300-308
Author(s):  
Bence Palfi ◽  
Zoltan Dienes

Psychologists are often interested in whether an independent variable has a different effect in condition A than in condition B. To test such a question, one needs to directly compare the effect of that variable in the two conditions (i.e., test the interaction). Yet many researchers tend to stop when they find a significant test in one condition and a nonsignificant test in the other condition, deeming this as sufficient evidence for a difference between the two conditions. In this Tutorial, we aim to raise awareness of this inferential mistake when Bayes factors are used with conventional cutoffs to draw conclusions. For instance, some researchers might falsely conclude that there must be good-enough evidence for the interaction if they find good-enough Bayesian evidence for the alternative hypothesis, H1, in condition A and good-enough Bayesian evidence for the null hypothesis, H0, in condition B. The case study we introduce highlights that ignoring the test of the interaction can lead to unjustified conclusions and demonstrates that the principle that any assertion about the existence of an interaction necessitates the direct comparison of the conditions is as true for Bayesian as it is for frequentist statistics. We provide an R script of the analyses of the case study and a Shiny app that can be used with a 2 × 2 design to develop intuitions on this issue, and we introduce a rule of thumb with which one can estimate the sample size one might need to have a well-powered design.


2004 ◽  
Vol 34 (1) ◽  
pp. 221-232 ◽  
Author(s):  
A Robinson

The construction of diameter-distribution models sometimes calls for the simultaneous prediction of population parameters from hierarchical data. Appropriate data for such models have characteristics that should be preserved or accommodated: clustering and contemporaneous correlations. Fitting techniques for such data must allow for these characteristics. Using a case study, I compare two techniques — seemingly-unrelated regression (SUR) and principal components analysis (PCA) — whilst using mixed-effects models. I adapt and apply a metric that focuses on volume prediction, which is a key application for diameter distributions. The results suggest that using mixed-effects models provides useful insights into environmental variation, and that SUR is more convenient and produces a slightly better fit than PCA. Both techniques are acceptable with regard to regression assumptions.


2016 ◽  
Vol 25 (6) ◽  
pp. 2506-2520 ◽  
Author(s):  
Xicheng Fang ◽  
Jialiang Li ◽  
Weng Kee Wong ◽  
Bo Fu

Mixed-effects models are increasingly used in many areas of applied science. Despite their popularity, there is virtually no systematic approach for examining the homogeneity of the random-effects covariance structure commonly assumed for such models. We propose two tests for evaluating the homogeneity of the covariance structure assumption across subjects: one is based on the covariance matrices computed from the fitted model and the other is based on the empirical variation computed from the estimated random effects. We used simulation studies to compare performances of the two tests for detecting violations of the homogeneity assumption in the mixed-effects models and showed that they were able to identify abnormal clusters of subjects with dissimilar random-effects covariance structures; in particular, their removal from the fitted model might change the signs and the magnitudes of important predictors in the analysis. In a case study, we applied our proposed tests to a longitudinal cohort study of rheumatoid arthritis patients and compared their abilities to ascertain whether the assumption of covariance homogeneity for subject-specific random effects holds.


2018 ◽  
Author(s):  
Daniel Lakens ◽  
Neil McLatchie ◽  
Peder Mortvedt Isager ◽  
Anne M. Scheel ◽  
Zoltan Dienes

Researchers often conclude an effect is absent when a null-hypothesis significance test yields a non-significant p-value. However, it is not logically nor statistically correct to conclude an effect is absent when a hypothesis test is not significant. We present two methods to evaluate the presence or absence of effects: Equivalence testing (based on frequentist statistics) and Bayes factors (based on Bayesian statistics). In four examples from the gerontology literature we illustrate different ways to specify alternative models that can be used to reject the presence of a meaningful or predicted effect in hypothesis tests. We provide detailed explanations of how to calculate, report, and interpret Bayes factors and equivalence tests. We also discuss how to design informative studies that can provide support for a null model or for the absence of a meaningful effect. The conceptual differences between Bayes factors and equivalence tests are discussed, and we also note when and why they might lead to similar or different inferences in practice. It is important that researchers are able to falsify predictions or can provide support for predicted null-effects. Bayes factors and equivalence tests provide useful statistical tools to improve inferences about null effects.


2018 ◽  
Vol 75 (1) ◽  
pp. 45-57 ◽  
Author(s):  
Daniël Lakens ◽  
Neil McLatchie ◽  
Peder M Isager ◽  
Anne M Scheel ◽  
Zoltan Dienes

AbstractResearchers often conclude an effect is absent when a null-hypothesis significance test yields a nonsignificant p value. However, it is neither logically nor statistically correct to conclude an effect is absent when a hypothesis test is not significant. We present two methods to evaluate the presence or absence of effects: Equivalence testing (based on frequentist statistics) and Bayes factors (based on Bayesian statistics). In four examples from the gerontology literature, we illustrate different ways to specify alternative models that can be used to reject the presence of a meaningful or predicted effect in hypothesis tests. We provide detailed explanations of how to calculate, report, and interpret Bayes factors and equivalence tests. We also discuss how to design informative studies that can provide support for a null model or for the absence of a meaningful effect. The conceptual differences between Bayes factors and equivalence tests are discussed, and we also note when and why they might lead to similar or different inferences in practice. It is important that researchers are able to falsify predictions or can quantify the support for predicted null effects. Bayes factors and equivalence tests provide useful statistical tools to improve inferences about null effects.


2021 ◽  
Author(s):  
Daniel W. Heck ◽  
Florence Bockting

Bayes factors allow researchers to test the effects of experimental manipulations in within-subjects designs using mixed-effects models. van Doorn et al. (2021) showed that such hypothesis tests can be performed by comparing different pairs of models which vary in the specification of the fixed- and random-effect structure for the within-subjects factor. To discuss the question of which of these model comparisons is most appropriate, van Doorn et al. used a case study to compare the corresponding Bayes factors. We argue that researchers should not only focus on pairwise comparisons of two nested models but rather use the Bayes factor for performing model selection among a larger set of mixed models that represent different auxiliary assumptions. In a standard one-factorial, repeated-measures design, the comparison should include four mixed-effects models: fixed-effects H0, fixed-effects H1, random-effects H0, and random-effects H1. Thereby, the Bayes factor enables testing both the average effect of condition and the heterogeneity of effect sizes across individuals. Bayesian model averaging provides an inclusion Bayes factor which quantifies the evidence for or against the presence of an effect of condition while taking model-selection uncertainty about the heterogeneity of individual effects into account. We present a simulation study showing that model selection among a larger set of mixed models performs well in recovering the true, data-generating model.


Author(s):  
Daniel W. Heck ◽  
Florence Bockting

AbstractBayes factors allow researchers to test the effects of experimental manipulations in within-subjects designs using mixed-effects models. van Doorn et al. (2021) showed that such hypothesis tests can be performed by comparing different pairs of models which vary in the specification of the fixed- and random-effect structure for the within-subjects factor. To discuss the question of which model comparison is most appropriate, van Doorn et al. compared three corresponding Bayes factors using a case study. We argue that researchers should not only focus on pairwise comparisons of two nested models but rather use Bayesian model selection for the direct comparison of a larger set of mixed models reflecting different auxiliary assumptions regarding the heterogeneity of effect sizes across individuals. In a standard one-factorial, repeated measures design, the comparison should include four mixed-effects models: fixed-effects H0, fixed-effects H1, random-effects H0, and random-effects H1. Thereby, one can test both the average effect of condition and the heterogeneity of effect sizes across individuals. Bayesian model averaging provides an inclusion Bayes factor which quantifies the evidence for or against the presence of an average effect of condition while taking model selection uncertainty about the heterogeneity of individual effects into account. We present a simulation study showing that model averaging among a larger set of mixed models performs well in recovering the true, data-generating model.


Sign in / Sign up

Export Citation Format

Share Document