Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?

Author(s):  
Deborah G. Mayo
2018 ◽  
Author(s):  
Jan Peter De Ruiter

Benjamin et al. (2017) proposed improving the reproducibility of findings in psychological research by lowering the alpha level of our conventional Null Hypothesis Significance Tests from .05 to .005, because findings with p-values close to .05 represent insufficient empirical evidence. They argued that findings with a p-value between 0.005 and 0.05 should still be published, but not called “significant” anymore.This proposal was criticized and rejected in a response by Lakens et al. (2018), who argued that instead of lowering the traditional alpha threshold to .005, we should stop using the term “statistically significant”, and require researchers to determine and justify their alpha levels before they collect data.In this contribution, I argue that the arguments presented by Lakens et al. against the proposal by Benjamin et al (2017) are not convincing. Thus, given that it is highly unlikely that our field will abandon the NHST paradigm any time soon, lowering our alpha level to .005 is at this moment the best way to combat the replication crisis in psychology.


2021 ◽  
pp. 108926802110465
Author(s):  
Nicole C. Nelson ◽  
Julie Chung ◽  
Kelsey Ichikawa ◽  
Momin M. Malik

This article outlines what we call the “narrative of psychology exceptionalism” in commentaries on the replication crisis: many thoughtful commentaries link the current crisis to the specificity of psychology’s history, methods, and subject matter, but explorations of the similarities between psychology and other fields are comparatively thin. Historical analyses of the replication crisis in psychology further contribute to this exceptionalism by creating a genealogy of events and personalities that shares little in common with other fields. We aim to rebalance this narrative by examining the emergence and evolution of replication discussions in psychology alongside their emergence and evolution in biomedicine. Through a mixed-methods analysis of commentaries on replication in psychology and the biomedical sciences, we find that these conversations have, from the early years of the crisis, shared a common core that centers on concerns about the effectiveness of traditional peer review, the need for greater transparency in methods and data, and the perverse incentive structure of academia. Drawing on Robert Merton’s framework for analyzing multiple discovery in science, we argue that the nearly simultaneous emergence of this narrative across fields suggests that there are shared historical, cultural, or institutional factors driving disillusionment with established scientific practices.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6232 ◽  
Author(s):  
Richard Wiseman ◽  
Caroline Watt ◽  
Diana Kornbrot

The recent ‘replication crisis’ in psychology has focused attention on ways of increasing methodological rigor within the behavioral sciences. Part of this work has involved promoting ‘Registered Reports’, wherein journals peer review papers prior to data collection and publication. Although this approach is usually seen as a relatively recent development, we note that a prototype of this publishing model was initiated in the mid-1970s by parapsychologist Martin Johnson in the European Journal of Parapsychology (EJP). A retrospective and observational comparison of Registered and non-Registered Reports published in the EJP during a seventeen-year period provides circumstantial evidence to suggest that the approach helped to reduce questionable research practices. This paper aims both to bring Johnson’s pioneering work to a wider audience, and to investigate the positive role that Registered Reports may play in helping to promote higher methodological and statistical standards.


2019 ◽  
Author(s):  
Giulia Bertoldo

The present work aims to analyze the replicability crisis in psychology with a focus on statistical inference. The main objective is to highlight the risks to beware when performing hypotheses tests in a Frequentist framework. In addition to the classic Type I and Type II errors, two other errors that are not commonly considered are Type M error (magnitude) and Type S error (sign), concerning the size and direction of the effects. The first chapter introduces Null Hypothesis Significance Testing (NHST), the prevalent approach to statistical inference in the social sciences, following a historical perspective and presenting also the approaches of the statisticians Ronald Fisher, Jerzy Neyman and Egon Pearson. The second chapter discusses the replicability crisis in psychology with an analysis of the origins, the factors that contributed to the crisis and solutions proposed for a change of direction. The third chapter analyzes the role of Type M and Type S errors in the replicability crisis. Studies with a high probability of committing these two types of errors could provide estimates of effects that are exaggerated and/or in the wrong direction. Two types of analysis are presented to examine these errors before conducting a study (prospective design analysis) or once the study has already been conducted (retrospective design analysis). The fourth chapter aims to link Type M and Type S errors with the decline effect, which is the observation that the magnitude of effects decreases with repeated replications. Although there may be multiple reasons behind the decline effect, a possible explanation is that the original study overestimated the effect. A case study illustrates how a retrospective design analysis of the original study can provide information on the probability of Type M and Type S errors and give support to the hypothesis of overestimation. The final chapter summarizes the contributions of Type M and Type S errors to the replication crisis and the role of a design analysis in planning studies and analyzing results.


2018 ◽  
Author(s):  
Bence Palfi ◽  
Peter Lush ◽  
Ryan Bradley Scott ◽  
Zoltan Dienes

Hypnosis and hypnotic suggestions are gradually gaining popularity within the consciousness community as established tools for the experimental manipulation of illusions of involuntariness, hallucinations and delusions. However, hypnosis is still far from being a widespread instrument; a crucial hindrance to taking it up is the amount of time needed to invest in identifying people high and low in responsiveness to suggestion. In this study, we introduced an online assessment of hypnotic response and estimated the extent to which the scores and psychometric properties of an online screening differ from an offline one. We propose that the online screening of hypnotic response is viable as it reduces the level of responsiveness only by a slight extent. The application of online screening may prompt researchers to run large-scale studies with more heterogeneous samples, which would help researchers to overcome some of the issues underlying the current replication crisis in psychology.


2021 ◽  
Author(s):  
Bradford Jay Wiggins ◽  
Cody D Christopherson

Psychology is in a replication crisis that has brought about a period of self-reflection and reform. Yet this reform appears in many ways to focus primarily on methodological and statistical practices, with little consideration for the foundational issues that concern many theoretical and philosophical psychologists and that may provide a richer account of the crisis. In this paper we offer an overview of the history of the replication crisis, the critiques and reforms at the heart of the crisis, and several points of intersection between the reform movement and broader theoretical and philosophical issues. We argue that the problems of the replication crisis and the concerns of the reform movement in fact provide various points of entry for theoretical and philosophical psychologists to collaborate with reformers in providing a more deeply philosophical critique and reform.


2020 ◽  
Author(s):  
Anne M. Scheel ◽  
Leonid Tiokhin ◽  
Peder Mortvedt Isager ◽  
Daniel Lakens

For almost half a century, Paul Meehl educated psychologists about how the mindless use of null-hypothesis significance tests made research on theories in the social sciences basically uninterpretable (Meehl, 1990). In response to the replication crisis, reforms in psychology have focused on formalising procedures for testing hypotheses. These reforms were necessary and impactful. However, as an unexpected consequence, psychologists have begun to realise that they may not be ready to test hypotheses. Forcing researchers to prematurely test hypotheses before they have established a sound ‘derivation chain’ between test and theory is counterproductive. Instead, various non-confirmatory research activities should be used to obtain the inputs necessary to make hypothesis tests informative. Before testing hypotheses, researchers should spend more time forming concepts, developing valid measures, establishing the causal relationships between concepts and their functional form, and identifying boundary conditions and auxiliary assumptions. Providing these inputs should be recognised and incentivised as a crucial goal in and of itself. In this article, we discuss how shifting the focus to non-confirmatory research can tie together many loose ends of psychology’s reform movement and help us lay the foundation to develop strong, testable theories, as Paul Meehl urged us to.


Sign in / Sign up

Export Citation Format

Share Document