scholarly journals Bayesian Frequentists: Examining the Paradox Between What Researchers Can Conclude Versus What They Want to Conclude From Statistical Results

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Matthias Haucke ◽  
Jonas Miosga ◽  
Rink Hoekstra ◽  
Don van Ravenzwaaij

A majority of statistically educated scientists draw incorrect conclusions based on the most commonly used statistical technique: null hypothesis significance testing (NHST). Frequentist techniques are often claimed to be incorrectly interpreted as Bayesian outcomes, which suggests that a Bayesian framework may fit better to inferences researchers frequently want to make (Briggs, 2012). The current study set out to test this proposition. Firstly, we investigated whether there is a discrepancy between what researchers think they can conclude and what they want to be able to conclude from NHST. Secondly, we investigated to what extent researchers want to incorporate prior study results and their personal beliefs in their statistical inference. Results show the expected discrepancy between what researchers think they can conclude from NHST and what they want to be able to conclude. Furthermore, researchers were interested in incorporating prior study results, but not their personal beliefs, into their statistical inference.

2020 ◽  
Author(s):  
Matthias Haucke ◽  
Jonas Miosga ◽  
Rink Hoekstra ◽  
Don van Ravenzwaaij

A majority of statistically educated scientists draw incorrect conclusions based on, arguably, the most commonly used statistical technique: null hypothesis significance testing (NHST). Frequentist techniques are often claimed to be incorrectly interpreted as Bayesian outcomes, which suggests that a Bayesian framework may fit better to inferences researchers frequently want to make (Briggs, 2012). The current study set out to test this proposition. Firstly, we investigated whether there is a discrepancy between what researchers think they can conclude and what they want to be able to conclude from NHST. Secondly, we investigated to what extent researchers want to incorporate prior study results and subjective beliefs in their statistical inference. Results show the expected discrepancy between what researchers think they can conclude from NHST and what they want to be able to conclude. Furthermore, researchers were interested in incorporating prior study results, but not subjective beliefs, into their statistical inference.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S773-S773
Author(s):  
Christopher Brydges ◽  
Allison A Bielak

Abstract Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research. Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


2019 ◽  
Author(s):  
Jan Sprenger

The replication crisis poses an enormous challenge to the epistemic authority of science and the logic of statistical inference in particular. Two prominent features of Null Hypothesis Significance Testing (NHST) arguably contribute to the crisis: the lack of guidance for interpreting non-significant results and the impossibility of quantifying support for the null hypothesis. In this paper, I argue that also popular alternatives to NHST, such as confidence intervals and Bayesian inference, do not lead to a satisfactory logic of evaluating hypothesis tests. As an alternative, I motivate and explicate the concept of corroboration of the null hypothesis. Finally I show how degrees of corroboration give an interpretation to non-significant results, combat publication bias and mitigate the replication crisis.


2019 ◽  
Author(s):  
Christopher Brydges

Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research.Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported.Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant ¬t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis.Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


2019 ◽  
Vol 62 (12) ◽  
pp. 4544-4553 ◽  
Author(s):  
Christopher R. Brydges ◽  
Laura Gaeta

Purpose Null hypothesis significance testing is commonly used in audiology research to determine the presence of an effect. Knowledge of study outcomes, including nonsignificant findings, is important for evidence-based practice. Nonsignificant p values obtained from null hypothesis significance testing cannot differentiate between true null effects or underpowered studies. Bayes factors (BFs) are a statistical technique that can distinguish between conclusive and inconclusive nonsignificant results, and quantify the strength of evidence in favor of 1 hypothesis over another. This study aimed to investigate the prevalence of BFs in nonsignificant results in audiology research and the strength of evidence in favor of the null hypothesis in these results. Method Nonsignificant results mentioned in abstracts of articles published in 2018 volumes of 4 prominent audiology journals were extracted ( N = 108) and categorized based on whether BFs were calculated. BFs were calculated from nonsignificant t tests within this sample to determine how frequently the null hypothesis was strongly supported. Results Nonsignificant results were not directly tested with BFs in any study. Bayesian re-analysis of 93 nonsignificant t tests found that only 40.86% of findings provided moderate evidence in favor of the null hypothesis, and none provided strong evidence. Conclusion BFs are underutilized in audiology research, and a large proportion of null findings were deemed inconclusive when re-analyzed with BFs. Researchers are encouraged to use BFs to test the validity and strength of evidence of nonsignificant results and ensure that sufficient sample sizes are used so that conclusive findings (significant or not) are observed more frequently. Supplemental Material https://osf.io/b4kc7/


2021 ◽  
Vol 70 (2) ◽  
pp. 123-133
Author(s):  
Norbert Hirschauer ◽  
Sven Grüner ◽  
Oliver Mußhoff ◽  
Claudia Becker

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypothesis testing” and Fisher’s “significance testing” that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the understanding of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the purpose of statistical inference. Finally, we make six propositions that hopefully contribute to improving the quality of inferences in future research.


2020 ◽  
Author(s):  
Norbert Hirschauer ◽  
Sven Gruener ◽  
Oliver Mußhoff ◽  
Claudia Becker

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypotheses testing” and Fisher’s “significance test-ing” approach that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the under-standing of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the pur-pose of statistical inference. Finally, we make six propositions that hopefully contribute to im-proving the quality of inferences in future research.


2019 ◽  
Vol 75 (1) ◽  
pp. 58-66 ◽  
Author(s):  
Christopher R Brydges ◽  
Allison A M Bielak

Abstract Objectives Nonsignificant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive nonsignificant results, by using Bayes factors (BFs) to reanalyze nonsignificant results from published gerontological research. Methods Nonsignificant results mentioned in abstracts of articles published in 2017 volumes of 10 top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from nonsignificant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results Nonsignificant results were directly tested with BFs in 1.22% of studies. Bayesian reanalyses of 195 nonsignificant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of nonsignificant results and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


Econometrics ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 26 ◽  
Author(s):  
David Trafimow

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.


2016 ◽  
Vol 11 (4) ◽  
pp. 551-554 ◽  
Author(s):  
Martin Buchheit

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.


Sign in / Sign up

Export Citation Format

Share Document