scholarly journals Why Did It Take So Many Decades for the Behavioral Sciences to Develop a Sense of Crisis Around Methodology and Replication?

Author(s):  
Andrew Gelman ◽  
Simine Vazire

For several decades, leading behavioral scientists have offered strong criticisms of the common practice of null hypothesis significance testing as producing spurious findings without strong theoretical or empirical support. But only in the past decade has this manifested as a full-scale replication crisis. We consider some possible reasons why, on or about December 2010, the behavioral sciences changed.

2015 ◽  
Vol 37 (4) ◽  
pp. 449-461 ◽  
Author(s):  
Andreas Ivarsson ◽  
Mark B. Andersen ◽  
Andreas Stenling ◽  
Urban Johnson ◽  
Magnus Lindwall

Null hypothesis significance testing (NHST) is like an immortal horse that some researchers have been trying to beat to death for over 50 years but without any success. In this article we discuss the flaws in NHST, the historical background in relation to both Fisher’s and Neyman and Pearson’s statistical ideas, the common misunderstandings of what p < 05 actually means, and the 2010 APA publication manual’s clear, but most often ignored, instructions to report effect sizes and to interpret what they all mean in the real world. In addition, we discuss how Bayesian statistics can be used to overcome some of the problems with NHST. We then analyze quantitative articles published over the past three years (2012–2014) in two top-rated sport and exercise psychology journals to determine whether we have learned what we should have learned decades ago about our use and meaningful interpretations of statistics.


Author(s):  
Brian D. Haig

Chapter 3 provides a brief overview of null hypothesis significance testing and points out its primary defects. It then outlines the neo-Fisherian account of tests of statistical significance, along with a second option contained in the philosophy of statistics known as the error-statistical philosophy, both of which are defensible. Tests of statistical significance are the most widely used means for evaluating hypotheses and theories in psychology. A massive critical literature has developed in psychology, and the behavioral sciences more generally, regarding the worth of these tests. The chapter provides a list of important lessons learned from the ongoing debates about tests of significance.


2019 ◽  
Author(s):  
Jan Sprenger

The replication crisis poses an enormous challenge to the epistemic authority of science and the logic of statistical inference in particular. Two prominent features of Null Hypothesis Significance Testing (NHST) arguably contribute to the crisis: the lack of guidance for interpreting non-significant results and the impossibility of quantifying support for the null hypothesis. In this paper, I argue that also popular alternatives to NHST, such as confidence intervals and Bayesian inference, do not lead to a satisfactory logic of evaluating hypothesis tests. As an alternative, I motivate and explicate the concept of corroboration of the null hypothesis. Finally I show how degrees of corroboration give an interpretation to non-significant results, combat publication bias and mitigate the replication crisis.


2021 ◽  
Author(s):  
Mark Rubin

Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.


Author(s):  
Freddy A. Paniagua

Ferguson (2015) observed that the proportion of studies supporting the experimental hypothesis and rejecting the null hypothesis is very high. This paper argues that the reason for this scenario is that researchers in the behavioral sciences have learned that the null hypothesis can always be rejected if one knows the statistical tricks to reject it (e.g., the probability of rejecting the null hypothesis increases with p = 0.05 compare to p = 0.01). Examples of the advancement of science without the need to formulate the null hypothesis are also discussed, as well as alternatives to null hypothesis significance testing-NHST (e.g., effect sizes), and the importance to distinguish the statistical significance from the practical significance of results.  


2021 ◽  
Author(s):  
Mark Rubin

Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.


2018 ◽  
Vol 9 (1) ◽  
pp. 11-19
Author(s):  
Lexi Brunner

Reexamining William Rozeboom’s recommendations for the future direction of disciplines such as psychology and philosophy is imminent due to the pressing issues in null hypothesis significance testing (NHST). An overreliance on NHST forms the basis of the replication crisis in psychology. Likewise, the discipline’s stringent guidelines on significance levels convey a pressure to publish, which is also significantly contributing to the replication crisis. As researchers’ careers are staked on the extent to which they publish, reassessing the fundamental issues with NHST within the context of Rozeboom’s ideas is paramount. In this brief review, I focus specifically on Rozeboom’s critiques of NHST and on a proposed alternative for evaluating the depth of findings within the framework of theoretical psychology. Thereafter, I compare his foreboding concerns with modern issues facing the validity of psychology and posit Rozeboom prophesized the failings of the discipline years ahead of his time. Keywords: Theoretical Psychology; NHST; William Rozeboom; P-Hacking; Explanatory Induction


2019 ◽  
Author(s):  
Felipe Romero ◽  
Jan Sprenger

The enduring replication crisis in many scientific disciplines casts doubt on the ability of science to self-correct its findings and to produce reliable knowledge. Amongst a variety of possible methodological, social, and statistical reforms to address the crisis, we focus on replacing null hypothesis significance testing (NHST) with Bayesian inference. On the basis of a simulation study for meta-analytic aggregation of effect sizes, we study the relative advantages of this Bayesian reform, and its interaction with widespread limitations in experimental research. Moving to Bayesian statistics will not solve the replication crisis single-handely, but would eliminate important sources of effect size overestimation for the conditions we study.


2009 ◽  
Vol 217 (1) ◽  
pp. 27-37 ◽  
Author(s):  
Fiona Fidler ◽  
Geoffrey R. Loftus

Null-hypothesis significance testing (NHST) is the primary means by which data are analyzed and conclusions made, particularly in the social sciences, but in other sciences as well (notably ecology and economics). Despite this supremacy however, numerous problems exist with NHST as a means of interpreting and understanding data. These problems have been articulated by various observers over the years, but are being taken seriously by researchers only slowly, if at all, as evidenced by the continuing emphasis on NHST in statistics classes, statistics textbooks, editorial policies and, of course, the day-to-day practices reported in empirical articles themselves ( Cumming et al., 2007 ). Over the past several decades, observers have suggested a simpler approach – plotting the data with appropriate confidence intervals (CIs) around relevant sample statistics – to supplement or take the place of hypothesis testing. This article addresses these issues.


Sign in / Sign up

Export Citation Format

Share Document