Degrees of freedom in planning, running, analyzing, and reporting psychological studies A checklist to avoid p-hacking

The designing, collecting, analyzing, and reporting of psychological studies entail many choices that are often arbitrary. The opportunistic use of these so-called researcher degrees of freedom aimed at obtaining statistically significant results is problematic because it enhances the chances of false positive results and may inflate effect size estimates. In this review article, we present an extensive list of 34 degrees of freedom that researchers have in formulating hypotheses, and in designing, running, analyzing, and reporting of psychological research. The list can be used in research methods education, and as a checklist to assess the quality of preregistrations and to determine the potential for bias due to (arbitrary) choices in unregistered studies.

Download Full-text

Science or Art? How Aesthetic Standards Grease the Way Through the Publication Bottleneck but Undermine Science

Perspectives on Psychological Science ◽

10.1177/1745691612457576 ◽

2012 ◽

Vol 7 (6) ◽

pp. 562-571 ◽

Cited By ~ 110

Author(s):

Roger Giner-Sorolla

Keyword(s):

Information Sharing ◽

False Positive ◽

Psychological Research ◽

Current Crisis ◽

Journal Publication ◽

Root Cause ◽

Methodological Rigor ◽

Pass Through ◽

The Relationship ◽

Positive Results

The current crisis in psychological research involves issues of fraud, replication, publication bias, and false positive results. I argue that this crisis follows the failure of widely adopted solutions to psychology’s similar crisis of the 1970s. The untouched root cause is an information-economic one: Too many studies divided by too few publication outlets equals a bottleneck. Articles cannot pass through just by showing theoretical meaning and methodological rigor; their results must appear to support the hypothesis perfectly. Consequently, psychologists must master the art of presenting perfect-looking results just to survive in the profession. This favors aesthetic criteria of presentation in a way that harms science’s search for truth. Shallow standards of statistical perfection distort analyses and undermine the accuracy of cumulative data; narrative expectations encourage dishonesty about the relationship between results and hypotheses; criteria of novelty suppress replication attempts. Concerns about truth in research are emerging in other sciences and may eventually descend on our heads in the form of difficult and insensitive regulations. I suggest a more palatable solution: to open the bottleneck, putting structures in place to reward broader forms of information sharing beyond the exquisite art of present-day journal publication.

Download Full-text

Analyzing dependent data as if independent biases effect size estimates and increases the risk of false-positive findings

Journal of Applied Physiology ◽

10.1152/japplphysiol.01024.2020 ◽

2021 ◽

Vol 130 (3) ◽

pp. 675-676 ◽

Cited By ~ 1

Author(s):

Martin E. Héroux

Keyword(s):

Effect Size ◽

False Positive ◽

Dependent Data ◽

Size Estimates ◽

As If

Download Full-text

Effects of False-Positive Results in a Breast Screening Program on Anxiety, Depression and Health-Related Quality of Life

Cancer Nursing ◽

10.1097/ncc.0b013e3182341ddb ◽

2012 ◽

Vol 35 (5) ◽

pp. E26-E34 ◽

Cited By ~ 13

Author(s):

Bjorg Hafslund ◽

Birgitte Espehaug ◽

Monica W. Nortvedt

Keyword(s):

Quality Of Life ◽

False Positive ◽

Breast Screening ◽

Screening Program ◽

Health Related Quality ◽

Related Quality ◽

Health Related ◽

Anxiety Depression ◽

Positive Results

Download Full-text

The role of meta-analysis and preregistration in assessing the evidence for cleansing effects

Behavioral and Brain Sciences ◽

10.1017/s0140525x20000606 ◽

2021 ◽

Vol 44 ◽

Author(s):

Robert M. Ross ◽

Robbie C. M. van Aert ◽

Olmo R. van den Akker ◽

Michiel van Elk

Keyword(s):

Publication Bias ◽

Effect Size ◽

Degrees Of Freedom ◽

Meta Analysis ◽

Replication Studies ◽

Size Estimates

Abstract Lee and Schwarz interpret meta-analytic research and replication studies as providing evidence for the robustness of cleansing effects. We argue that the currently available evidence is unconvincing because (a) publication bias and the opportunistic use of researcher degrees of freedom appear to have inflated meta-analytic effect size estimates, and (b) preregistered replications failed to find any evidence of cleansing effects.

Download Full-text

Ensuring the quality and specificity of preregistrations

10.31234/osf.io/cdgyh ◽

2018 ◽

Cited By ~ 4

Author(s):

Marjan Bakker ◽

Coosje Lisabet Sterre Veldkamp ◽

Marcel A. L. M. van Assen ◽

Elise Anne Victoire Crompvoets ◽

How Hwee Ong ◽

...

Keyword(s):

Degrees Of Freedom ◽

Statistical Significance ◽

Effect Sizes ◽

Standard Format ◽

Random Samples ◽

Independent Review ◽

Direct Guidance ◽

Quality Of Research ◽

Positive Results

Researchers face many, often seemingly arbitrary choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of ‘researcher degrees of freedom’ aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared two formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge registration (now called “OSF Preregistration”, http://osf.io/prereg/). The Prereg Challenge format was a structured workflow with detailed instructions, and an independent review to confirm completeness; the “Standard” format was unstructured with minimal direct guidance to give researchers flexibility for what to pre-specify. Results of comparing random samples of 53 preregistrations from each format indicate that the structured format restricted the opportunistic use of researcher degrees of freedom better (Cliff’s Delta = 0.49) than the unstructured format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.

Download Full-text

Accuracy of Effect Size Estimates from Published Psychological Research

Perceptual and Motor Skills ◽

10.2466/pms.106.2.645-649 ◽

2008 ◽

Vol 106 (2) ◽

pp. 645-649 ◽

Cited By ~ 24

Author(s):

Andrew Brand ◽

Michael T. Bradley ◽

Lisa A. Best ◽

George Stoica

Keyword(s):

Effect Size ◽

Psychological Research ◽

Size Estimates

Download Full-text

Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

10.31234/osf.io/43yae ◽

2021 ◽

Author(s):

Anton Olsson-Collentine ◽

Robbie Cornelis Maria van Aert ◽

Marjan Bakker ◽

Jelte M. Wicherts

Keyword(s):

Effect Size ◽

Degrees Of Freedom ◽

Meta Analysis ◽

Extreme Case ◽

Large Degree ◽

Selective Reporting ◽

Primary Research ◽

Standard Deviations ◽

Meta Analyses ◽

Size Estimates

There are arbitrary decisions to be made (i.e., researcher degrees of freedom) in the execution and reporting of most research. These decisions allow for many possible outcomes from a single study. Selective reporting of results from this ‘multiverse’ of outcomes, whether intentional (_p_-hacking) or not, can lead to inflated effect size estimates and false positive results in the literature. In this study, we examine and illustrate the consequences of researcher degrees of freedom in primary research, both for primary outcomes and for subsequent meta-analyses. We used a set of 10 preregistered multi-lab direct replication projects from psychology (Registered Replication Reports) with a total of 14 primary outcome variables, 236 labs and 37,602 participants. By exploiting researcher degrees of freedom in each project, we were able to compute between 3,840 and 2,621,440 outcomes per lab. We show that researcher degrees of freedom in primary research can cause substantial variability in effect size that we denote the Underlying Multiverse Variability (UMV). In our data, the median UMV across labs was 0.1 standard deviations (interquartile range = 0.09 – 0.15). In one extreme case, the effect size estimate could change by _d_ = 1.27, evidence that _p_-hacking in some (rare) cases can provide support for almost any conclusion. We also show that researcher degrees of freedom in primary research provide another source of uncertainty in meta-analysis beyond those usually estimated. This would not be a large concern for meta-analysis if researchers made all arbitrary decisions at random. However, emulating selective reporting of lab results led to inflation of meta-analytic average effect size estimates in our data by as much as 0.1 - 0.48 standard deviations, depending to a large degree on the number of possible outcomes at the lab level (i.e., multiverse size). Our results illustrate the importance of making research decisions transparent (e.g., through preregistration and multiverse analysis), evaluating studies for selective reporting, and whenever feasible making raw data available.

Download Full-text

The role of meta-analysis and preregistration in assessing the evidence for cleansing effects

10.31234/osf.io/u9fgn ◽

2021 ◽

Author(s):

Robert M Ross ◽

Robbie Cornelis Maria van Aert ◽

Olmo Van den Akker ◽

Michiel van Elk

Keyword(s):

Publication Bias ◽

Effect Size ◽

Degrees Of Freedom ◽

Meta Analysis ◽

Replication Studies ◽

Size Estimates

Lee and Schwarz interpret meta-analytic research and replication studies as providing evidence for the robustness of cleansing effects. We argue that the currently available evidence is unconvincing because (a) publication bias and the opportunistic use of researcher degrees of freedom appear to have inflated meta-analytic effect size estimates, and (b) preregistered replications failed to find any evidence of cleansing effects.

Download Full-text

Performing Testicular Self-Examination, Driving Automobiles, and Anxiety: What Is the Logical Link?

American Journal of Men s Health ◽

10.1177/1557988316635048 ◽

2016 ◽

Vol 12 (3) ◽

pp. 594-596 ◽

Cited By ~ 1

Author(s):

Michael J. Rovito

Keyword(s):

Quality Of Life ◽

Testicular Cancer ◽

Health Outcomes ◽

False Positive ◽

Anxiety And Depression ◽

Potential Health ◽

Health Concerns ◽

Self Examination ◽

Positive Results

The debate of whether testicular self-examination (TSE) should be promoted among males generally centers on a harm–benefit corollary. The benefits of TSE include improving health outcomes, inclusive of an increase in both quality of life and knowledge/awareness of potential health concerns, as well as promoting proactivity in achieving wellness. The harms include claims that false-positive results can increase anxiety and produce costs via unnecessary treatments and therapies. Further claims point to the lack of evidence suggesting TSE decreases testicular cancer mortality. This commentary primarily discusses the anxiety portion of this debate from a logic-based perspective. The argument that TSE should not be promoted among males due to the risk of inciting false-positive anxiety appears to be flawed. A 5-point perspective is presented on the illogical discouragement of TSE due to theorized levels of false-positive anxiety while existing evidence suggests late-stage testicular cancer is associated with anxiety and depression.

Download Full-text

Marginal Mediation Analysis: A Practical Statistical Framework for Interpretable Mediation Effects

10.31234/osf.io/fgm8t ◽

2019 ◽

Cited By ~ 1

Author(s):

Tyson S. Barrett ◽

Ginger Lockhart ◽

Rick Anthony Cruz

Keyword(s):

Monte Carlo ◽

Effect Size ◽

Mediation Analysis ◽

Psychological Research ◽

Health Risk Behavior ◽

Mediation Effects ◽

Statistical Framework ◽

Marginal Effects ◽

Average Marginal Effects ◽

Size Estimates

Mediation analysis is a widely used technique within the psychological sciences and has been shown to be an effective tool to evaluate explanatory pathways between predictors and outcomes. Multiple effect size metrics have been developed; however, mediation analysis has been slow to develop accessible, interpretable effect size metrics in the cases of categorical (or otherwise non-normally distributed) mediators and/or outcomes. Herein, we propose the use of average marginal effects within mediation analysis to alleviate these issues—termed Marginal Mediation Analysis. The method provides interpretable indirect and direct effect size estimates in the same units as the outcome even when mediators and/or outcomes are categorical, a count measure, or another non-normal distribution. The approach is shown to fit the causal definitions of mediation analysis. We further present results of Monte Carlo simulations that show the utility of the proposed method in psychological research. We also discuss the assumptions inherent in the approach. We conclude by showing an application of it to adolescent health-risk behavior data (n = 13,600), demonstrating the increased interpretability and information provided compared to other common approaches.

Download Full-text