scholarly journals Why p-values are not measures of evidence

2021 ◽  
Author(s):  
Daniel Lakens

The recommendations by Muff and colleagues are an incoherent approach to statistical inferences, and should only be used if one wants to signal a misunderstanding of p-values. Coherent alternatives to quantify evidence exist, such as likelihoods and Bayes factors. Therefore, researchers should not follow the recommendation by Muff and colleagues to report p = 0.08 as ‘weak evidence’, p = 0.03 as ‘moderate evidence’, and p = 0.168 as ‘no evidence’.

2018 ◽  
Vol 1 (2) ◽  
pp. 270-280 ◽  
Author(s):  
John K. Kruschke

This article explains a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters. This decision rule focuses on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value. The article also discusses considerations for setting the limits of a ROPE and emphasizes that analogous considerations apply to setting the decision thresholds for p values and Bayes factors.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S773-S773
Author(s):  
Christopher Brydges ◽  
Allison A Bielak

Abstract Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research. Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


Author(s):  
Tamás Ferenci ◽  
Levente Kovács

Null hypothesis significance testing dominates the current biostatistical practice. However, this routine has many flaws, in particular p-values are very often misused and misinterpreted. Several solutions has been suggested to remedy this situation, the application of Bayes Factors being perhaps the most well-known. Nevertheless, even Bayes Factors are very seldom applied in medical research. This paper investigates the application of Bayes Factors in the analysis of a realistic medical problem using actual data from a representative US survey, and compares the results to those obtained with traditional means. Linear regression is used as an example as it is one of the most basic tools in biostatistics. The effect of sample size and sampling variation is investigated (with resampling) as well as the impact of the choice of prior. Results show that there is a strong relationship between p-values and Bayes Factors, especially for large samples. The application of Bayes Factors should be encouraged evenin spite of this, as the message they convey is much more instructive and scientifically correct than the current typical practice.


2019 ◽  
Author(s):  
Christopher Brydges

Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research.Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported.Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant ¬t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis.Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


2019 ◽  
Vol 62 (12) ◽  
pp. 4544-4553 ◽  
Author(s):  
Christopher R. Brydges ◽  
Laura Gaeta

Purpose Null hypothesis significance testing is commonly used in audiology research to determine the presence of an effect. Knowledge of study outcomes, including nonsignificant findings, is important for evidence-based practice. Nonsignificant p values obtained from null hypothesis significance testing cannot differentiate between true null effects or underpowered studies. Bayes factors (BFs) are a statistical technique that can distinguish between conclusive and inconclusive nonsignificant results, and quantify the strength of evidence in favor of 1 hypothesis over another. This study aimed to investigate the prevalence of BFs in nonsignificant results in audiology research and the strength of evidence in favor of the null hypothesis in these results. Method Nonsignificant results mentioned in abstracts of articles published in 2018 volumes of 4 prominent audiology journals were extracted ( N = 108) and categorized based on whether BFs were calculated. BFs were calculated from nonsignificant t tests within this sample to determine how frequently the null hypothesis was strongly supported. Results Nonsignificant results were not directly tested with BFs in any study. Bayesian re-analysis of 93 nonsignificant t tests found that only 40.86% of findings provided moderate evidence in favor of the null hypothesis, and none provided strong evidence. Conclusion BFs are underutilized in audiology research, and a large proportion of null findings were deemed inconclusive when re-analyzed with BFs. Researchers are encouraged to use BFs to test the validity and strength of evidence of nonsignificant results and ensure that sufficient sample sizes are used so that conclusive findings (significant or not) are observed more frequently. Supplemental Material https://osf.io/b4kc7/


2019 ◽  
Vol 73 (sup1) ◽  
pp. 148-151 ◽  
Author(s):  
Jonathan Rougier
Keyword(s):  

Author(s):  
Leonhard Held ◽  
Manuela Ott
Keyword(s):  

2018 ◽  
Author(s):  
Hyemin Han ◽  
Joonsuk Park

We composed an R-based script for Image-based Bayesian random-effect meta-analysis of previous fMRI studies. It meta-analyzes second-level test results of the studies and calculates Bayes Factors indicating whether the effect in each voxel is significantly different from zero. We compared results from Bayesian and classical meta-analyses by examining the overlap between the result from each method and that created by NeuroSynth as the target. As an example, we analyzed previous fMRI studies focusing on working memory extracted from NeuroSynth. The result from our Bayesian method showed a greater overlap than the classical method. In addition, Bayes Factors proved a better way to examine whether the evidence supported hypotheses than p-values. Given these, Bayesian meta-analysis provides neuroscientists with a better meta-analysis method for fMRI studies given the improved overlap with the NeuroSynth result and the practical and epistemological value of Bayes Factors that can directly test presence of an effect.


Sign in / Sign up

Export Citation Format

Share Document