p-Values, Bayes Factors, and Sufficiency

This article explains a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters. This decision rule focuses on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value. The article also discusses considerations for setting the limits of a ROPE and emphasizes that analogous considerations apply to setting the decision thresholds for p values and Bayes factors.

Download Full-text

METHODS, THEORY, AND INNOVATION A BAYESIAN ANALYSIS OF EVIDENCE IN SUPPORT OF THE NULL HYPOTHESIS IN GERONTOLOGICAL PSYCHOLOGY (OR LACK THEREOF)

Innovation in Aging ◽

10.1093/geroni/igz038.2841 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

pp. S773-S773

Author(s):

Christopher Brydges ◽

Allison A Bielak

Keyword(s):

Strong Evidence ◽

Null Hypothesis ◽

Research Method ◽

Statistical Technique ◽

Bayes Factors ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Bayesian Analyses ◽

P Values ◽

Gerontological Research

Abstract Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research. Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.

Download Full-text

A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference

Multivariate Behavioral Research ◽

10.1080/00273171.2015.1099032 ◽

2016 ◽

Vol 51 (1) ◽

pp. 23-29 ◽

Cited By ~ 13

Author(s):

Hal S. Stern

Keyword(s):

Statistical Inference ◽

Bayes Factors ◽

P Values

Download Full-text

Experiences with Using Bayes Factors for Regression Analysis in Biostatistical Setting

Periodica Polytechnica Electrical Engineering and Computer Science ◽

10.3311/ppee.9898 ◽

2017 ◽

Vol 61 (3) ◽

pp. 246

Author(s):

Tamás Ferenci ◽

Levente Kovács

Keyword(s):

Null Hypothesis ◽

Strong Relationship ◽

Medical Problem ◽

Bayes Factors ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Large Samples ◽

Sampling Variation ◽

The Impact

Null hypothesis significance testing dominates the current biostatistical practice. However, this routine has many flaws, in particular p-values are very often misused and misinterpreted. Several solutions has been suggested to remedy this situation, the application of Bayes Factors being perhaps the most well-known. Nevertheless, even Bayes Factors are very seldom applied in medical research. This paper investigates the application of Bayes Factors in the analysis of a realistic medical problem using actual data from a representative US survey, and compares the results to those obtained with traditional means. Linear regression is used as an example as it is one of the most basic tools in biostatistics. The effect of sample size and sampling variation is investigated (with resampling) as well as the impact of the choice of prior. Results show that there is a strong relationship between p-values and Bayes Factors, especially for large samples. The application of Bayes Factors should be encouraged evenin spite of this, as the message they convey is much more instructive and scientifically correct than the current typical practice.

Download Full-text

Why p-values are not measures of evidence

10.31234/osf.io/7ng4w ◽

2021 ◽

Author(s):

Daniel Lakens

Keyword(s):

Bayes Factors ◽

P Values ◽

Moderate Evidence ◽

Statistical Inferences ◽

Weak Evidence

The recommendations by Muff and colleagues are an incoherent approach to statistical inferences, and should only be used if one wants to signal a misunderstanding of p-values. Coherent alternatives to quantify evidence exist, such as likelihoods and Bayes factors. Therefore, researchers should not follow the recommendation by Muff and colleagues to report p = 0.08 as ‘weak evidence’, p = 0.03 as ‘moderate evidence’, and p = 0.168 as ‘no evidence’.

Download Full-text

A Bayesian Analysis of Evidence in Support of the Null Hypothesis in Gerontological Psychology (or Lack Thereof)

10.31234/osf.io/934ke ◽

2019 ◽

Author(s):

Christopher Brydges

Keyword(s):

Bayesian Analysis ◽

Strong Evidence ◽

Null Hypothesis ◽

Statistical Technique ◽

Bayes Factors ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Bayesian Analyses ◽

P Values ◽

Gerontological Research

Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research.Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported.Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant ¬t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis.Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.

Download Full-text

On p-Values and Bayes Factors

Annual Review of Statistics and Its Application ◽

10.1146/annurev-statistics-031017-100307 ◽

2018 ◽

Vol 5 (1) ◽

pp. 393-419 ◽

Cited By ~ 67

Author(s):

Leonhard Held ◽

Manuela Ott

Keyword(s):

Bayes Factors ◽

P Values

Download Full-text

Bayesian Meta-analysis of fMRI Image Data

10.31234/osf.io/uycdz ◽

2018 ◽

Author(s):

Hyemin Han ◽

Joonsuk Park

Keyword(s):

Bayesian Method ◽

Classical Method ◽

Meta Analysis ◽

Random Effect ◽

Image Data ◽

Bayes Factors ◽

Test Results ◽

Analysis Method ◽

P Values ◽

Meta Analyses

We composed an R-based script for Image-based Bayesian random-effect meta-analysis of previous fMRI studies. It meta-analyzes second-level test results of the studies and calculates Bayes Factors indicating whether the effect in each voxel is significantly different from zero. We compared results from Bayesian and classical meta-analyses by examining the overlap between the result from each method and that created by NeuroSynth as the target. As an example, we analyzed previous fMRI studies focusing on working memory extracted from NeuroSynth. The result from our Bayesian method showed a greater overlap than the classical method. In addition, Bayes Factors proved a better way to examine whether the evidence supported hypotheses than p-values. Given these, Bayesian meta-analysis provides neuroscientists with a better meta-analysis method for fMRI studies given the improved overlap with the NeuroSynth result and the practical and epistemological value of Bayes Factors that can directly test presence of an effect.

Download Full-text

Studies of cross-lingual long-term priming

10.31234/osf.io/ert8k ◽

2017 ◽

Cited By ~ 2

Author(s):

Eva Denise Poort ◽

Jennifer M Rodd

Keyword(s):

Lexical Decision ◽

Priming Effect ◽

Bayes Factors ◽

Decision Task ◽

Recent Experience ◽

Facilitation Effect ◽

P Values ◽

Interlingual Homographs ◽

Cross Lingual

Poort, Warren and Rodd (2016) showed that bilinguals profit from recent experience with an identical cognate in their native language when they encounter the same word in their second language. We conducted two experiments employing the same cross-lingual long-term priming paradigm to determine whether this is also the case for non-identical cognates, as this would indicate they share an orthographic representation in the bilingual lexicon. In Experiment 1, Dutch–English bilinguals read Dutch sentences containing identical cognates (e.g. “winter”–“winter”), non-identical cognates (e.g. “baard”–“beard”) or the Dutch translations (e.g. “fiets”) of English control words (e.g. “bike”). These words were presented again in an English lexical decision task approximately 19 minutes later. The analysis revealed only weak evidence, based both on p-values and Bayes factors, for a small 6-9 ms facilitative priming effect. Experiment 2 aimed to determine whether including interlingual homographs (e.g. “angel”–“angel”) in the experiment modulates the size of the priming effect. This time, the analysis revealed no evidence for a priming effect, either based on p-values or Bayes factors, in either version of the experiment for either the cognates or the interlingual homographs. In line with previous findings (Poort & Rodd, 2017, May 9), we did find strong evidence for an interlingual homograph inhibition effect and no evidence for a cognate facilitation effect. We conclude that, since the cross-lingual long-term priming effect is largely semantic in nature, the lexical decision tasks we used were not sensitive enough to detect an effect of priming.Note: This manuscript has not been peer-reviewed.

Download Full-text