scholarly journals There are no ‘Small’ or ‘Large’ Effects: A Reply to Götz et al. (2021)

2021 ◽  
Author(s):  
Maximilian Primbs ◽  
Charlotte Rebecca Pennington ◽  
Daniel Lakens ◽  
Miguel Alejandro Silan ◽  
Dwayne Sean Noah Lieck ◽  
...  

Götz et al. (2021) argue that small effects are the indispensable foundation for a cumulative psychological science. Whilst we applaud their efforts to bring this important discussion to the forefront, we argue that their core arguments do not hold up under scrutiny, and if left uncorrected have the potential to undermine best practices in reporting and interpreting effect size estimates. Their article can be used as a convenient blanket defense to justify ‘small’ effects as meaningful. In our reply, we first argue that comparisons between psychological science and genetics are fundamentally flawed because these disciplines have vastly different goals and methodology. Second, we argue that p-values, not effect sizes, are the main currency for publication in psychology, meaning that any biases in the literature are caused by this pressure to publish statistically significant results, not a pressure to publish large effects. Third, we contend that claims regarding small effects as important and consequential must be supported by empirical evidence, or at least require a falsifiable line of reasoning. Finally, we propose that researchers should evaluate effect sizes in relative, not absolute terms, and provide several approaches of how this can be achieved.

Author(s):  
David J. Miller ◽  
James T. Nguyen ◽  
Matteo Bottai

Artificial effect-size magnification (ESM) may occur in underpowered studies, where effects are reported only because they or their associated p-values have passed some threshold. Ioannidis (2008, Epidemiology 19: 640–648) and Gelman and Carlin (2014, Perspectives on Psychological Science 9: 641–651) have suggested that the plausibility of findings for a specific study can be evaluated by computation of ESM, which requires statistical simulation. In this article, we present a new command called emagnification that allows straightforward implementation of such simulations in Stata. The commands automate these simulations for epidemiological studies and enable the user to assess ESM routinely for published studies using user-selected, study-specific inputs that are commonly reported in published literature. The intention of the command is to allow a wider community to use ESMs as a tool for evaluating the reliability of reported effect sizes and to put an observed statistically significant effect size into a fuller context with respect to potential implications for study conclusions.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


2018 ◽  
Vol 22 (4) ◽  
pp. 469-476 ◽  
Author(s):  
Ian J. Davidson

The reporting and interpretation of effect sizes is often promoted as a panacea for the ramifications of institutionalized statistical rituals associated with the null-hypothesis significance test. Mechanical objectivity—conflating the use of a method with the obtainment of truth—is a useful theoretical tool for understanding the possible failure of effect size reporting ( Porter, 1995 ). This article helps elucidate the ouroboros of psychological methodology. This is the cycle of improved tools to produce trustworthy knowledge, leading to their institutionalization and adoption as forms of thinking, leading to methodologists eventually admonishing researchers for relying too heavily on rituals, finally leading to the production of more new improved quantitative tools that may follow along this circular path. Despite many critiques and warnings, research psychologists’ superficial adoption of effect sizes might preclude expert interpretation much like in the null-hypothesis significance test as widely received. One solution to this situation is bottom-up: promoting a balance of mechanical objectivity and expertise in the teaching of methods and research. This would require the acceptance and encouragement of expert interpretation within psychological science.


2010 ◽  
Vol 3 (2) ◽  
pp. 106-112 ◽  
Author(s):  
Matthew J. Rinella ◽  
Jeremy J. James

AbstractNull hypothesis significance testing (NHST) forms the backbone of statistical inference in invasive plant science. Over 95% of research articles in Invasive Plant Science and Management report NHST results such as P-values or statistics closely related to P-values such as least significant differences. Unfortunately, NHST results are less informative than their ubiquity implies. P-values are hard to interpret and are regularly misinterpreted. Also, P-values do not provide estimates of the magnitudes and uncertainties of studied effects, and these effect size estimates are what invasive plant scientists care about most. In this paper, we reanalyze four datasets (two of our own and two of our colleagues; studies put forth as examples in this paper are used with permission of their authors) to illustrate limitations of NHST. The re-analyses are used to build a case for confidence intervals as preferable alternatives to P-values. Confidence intervals indicate effect sizes, and compared to P-values, confidence intervals provide more complete, intuitively appealing information on what data do/do not indicate.


2020 ◽  
Author(s):  
Molly Lewis ◽  
Maya B Mathur ◽  
Tyler VanderWeele ◽  
Michael C. Frank

What is the best way to estimate the size of important effects? Should we aggregate across disparate findings using statistical meta-analysis, or instead run large, multi-lab replications (MLR)? A recent paper by Kvarven, Strømland, and Johannesson (2020) compared effect size estimates derived from these two different methods for 15 different psychological phenomena. The authors report that, for the same phenomenon, the meta-analytic estimate tends to be about three times larger than the MLR estimate. These results pose an important puzzle: What is the relationship between these two estimates? Kvarven et al. suggest that their results undermine the value of meta-analysis. In contrast, we argue that both meta-analysis and MLR are informative, and that the discrepancy between estimates obtained via the two methods is in fact still unexplained. Informed by re-analyses of Kvarven et al.’s data and by other empirical evidence, we discuss possible sources of this discrepancy and argue that understanding the relationship between estimates obtained from these two methods is an important puzzle for future meta-scientific research.


2021 ◽  
Author(s):  
Suresh Muthukumaraswamy ◽  
Anna Forsyth ◽  
Thomas Lumley

There is increasing interest in the potential for psychedelic drugs such as psilocybin, LSD and ketamine to treat a number of mental health disorders. To gain evidence for the therapeutic effectiveness of psychedelics, a number of randomised controlled trials (RCTs) have been conducted using the traditional RCT framework and these trials have generally shown promising results, with large effect sizes reported. However, in this paper we argue that estimation of treatment effect sizes in psychedelic clinical trials are likely over-estimated due to de-blinding of participants and high levels of response expectancy generated by RCT trial contingencies. The degree of over-estimation is at present difficult to estimate. We conduct systematic reviews of psychedelic RCTs and show that currently reported RCTs have failed to measure and report expectancy and malicious de-blinding. In order to overcome these confounds we argue that RCTs should routinely measure de-blinding and expectancy and that careful attention should be paid to the clinical trial design used and the instructions given to participants to allow these confounds to be estimated and removed from effect size estimates. We urge caution in interpreting effect size estimates from extant psychedelic RCTs.


2016 ◽  
Author(s):  
Brian A. Nosek ◽  
Johanna Cohoon ◽  
Mallory Kidwell ◽  
Jeffrey Robert Spies

Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.


2021 ◽  
pp. 107699862110355
Author(s):  
Seang-Hwane Joo ◽  
Yan Wang ◽  
John Ferron ◽  
S. Natasha Beretvas ◽  
Mariola Moeyaert ◽  
...  

Multiple baseline (MB) designs are becoming more prevalent in educational and behavioral research, and as they do, there is growing interest in combining effect size estimates across studies. To further refine the meta-analytic methods of estimating the effect, this study developed and compared eight alternative methods of estimating intervention effects from a set of MB studies. The methods differed in the assumptions made and varied in whether they relied on within- or between-series comparisons, modeled raw data or effect sizes, and did or did not standardize. Small sample functioning was examined through two simulation studies, which showed that when data were consistent with assumptions the bias was consistently less than 5% of the effect size for each method, whereas root mean squared error varied substantially across methods. When assumptions were violated, substantial biases were found. Implications and limitations are discussed.


2021 ◽  
Author(s):  
Farid Anvari ◽  
Rogier Kievit ◽  
Daniel Lakens ◽  
Andrew K Przybylski ◽  
Leonid Tiokhin ◽  
...  

Psychological researchers currently lack guidance for how to evaluate the practical relevance of observed effect sizes, i.e. whether a finding will have impact when translated to a different context of application. Although psychologists have recently highlighted theoretical justifications for why small effect sizes might be practically relevant, such justifications are simplistic and fail to provide the information necessary for evaluation and falsification. Claims about whether an observed effect size is practically relevant need to consider both the mechanisms amplifying and counteracting practical relevance, as well as the assumptions underlying each mechanism at play. To provide guidance for systematically evaluating whether an observed effect size is practically relevant, we present examples of widely applicable mechanisms and the key assumptions needed for justifying whether an observed effect size can be expected to generalize to different contexts. Routine use of these mechanisms to justify claims about practical relevance has the potential to make researchers’ claims about generalizability substantially more transparent. This transparency can help move psychological science towards a more rigorous assessment of when psychological findings can be applied in the world.


2017 ◽  
Author(s):  
Daniel Lakens ◽  
Elizabeth Page-Gould ◽  
Marcel A. L. M. van Assen ◽  
Bobbie Spellman ◽  
Felix D. Schönbrodt ◽  
...  

Meta-analyses are an important tool to evaluate the literature. It is essential that meta-analyses can easily be reproduced to allow researchers to evaluate the impact of subjective choices on meta-analytic effect sizes, but also to update meta-analyses as new data comes in, or as novel statistical techniques (for example to correct for publication bias) are developed. Research in medicine has revealed meta-analyses often cannot be reproduced. In this project, we examined the reproducibility of meta-analyses in psychology by reproducing twenty published meta-analyses. Reproducing published meta-analyses was surprisingly difficult. 96% of meta-analyses published in 2013-2014 did not adhere to reporting guidelines. A third of these meta-analyses did not contain a table specifying all individual effect sizes. Five of the 20 randomly selected meta-analyses we attempted to reproduce could not be reproduced at all due to lack of access to raw data, no details about the effect sizes extracted from each study, or a lack of information about how effect sizes were coded. In the remaining meta-analyses, differences between the reported and reproduced effect size or sample size were common. We discuss a range of possible improvements, such as more clearly indicating which data were used to calculate an effect size, specifying all individual effect sizes, adding detailed information about equations that are used, and how multiple effect size estimates from the same study are combined, but also sharing raw data retrieved from original authors, or unpublished research reports. This project clearly illustrates there is a lot of room for improvement when it comes to the transparency and reproducibility of published meta-analyses.


Sign in / Sign up

Export Citation Format

Share Document