scholarly journals p-Hacking and Publication Bias Interact to Distort Meta-Analytic Effect Size Estimates

2020 ◽  
Author(s):  
Malte Friese ◽  
Julius Frankenbach

Science depends on trustworthy evidence. Thus, a biased scientific record is of questionable value because it impedes scientific progress, and the public receives advice on the basis of unreliable evidence that has the potential to have far-reaching detrimental consequences. Meta-analysis is a valid and reliable technique that can be used to summarize research evidence. However, meta-analytic effect size estimates may themselves be biased, threatening the validity and usefulness of meta-analyses to promote scientific progress. Here, we offer a large-scale simulation study to elucidate how p-hacking and publication bias distort meta-analytic effect size estimates under a broad array of circumstances that reflect the reality that exists across a variety of research areas. The results revealed that, first, very high levels of publication bias can severely distort the cumulative evidence. Second, p-hacking and publication bias interact: At relatively high and low levels of publication bias, p-hacking does comparatively little harm, but at medium levels of publication bias, p-hacking can considerably contribute to bias, especially when the true effects are very small or are approaching zero. Third, p-hacking can severely increase the rate of false positives. A key implication is that, in addition to preventing p-hacking, policies in research institutions, funding agencies, and scientific journals need to make the prevention of publication bias a top priority to ensure a trustworthy base of evidence.

2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


2021 ◽  
Vol 44 ◽  
Author(s):  
Robert M. Ross ◽  
Robbie C. M. van Aert ◽  
Olmo R. van den Akker ◽  
Michiel van Elk

Abstract Lee and Schwarz interpret meta-analytic research and replication studies as providing evidence for the robustness of cleansing effects. We argue that the currently available evidence is unconvincing because (a) publication bias and the opportunistic use of researcher degrees of freedom appear to have inflated meta-analytic effect size estimates, and (b) preregistered replications failed to find any evidence of cleansing effects.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252415
Author(s):  
Ivan Ropovik ◽  
Matus Adamkovic ◽  
David Greger

Because negative findings have less chance of getting published, available studies tend to be a biased sample. This leads to an inflation of effect size estimates to an unknown degree. To see how meta-analyses in education account for publication bias, we surveyed all meta-analyses published in the last five years in the Review of Educational Research and Educational Research Review. The results show that meta-analyses usually neglect publication bias adjustment. In the minority of meta-analyses adjusting for bias, mostly non-principled adjustment methods were used, and only rarely were the conclusions based on corrected estimates, rendering the adjustment inconsequential. It is argued that appropriate state-of-the-art adjustment (e.g., selection models) should be attempted by default, yet one needs to take into account the uncertainty inherent in any meta-analytic inference under bias. We conclude by providing practical recommendations on dealing with publication bias.


2017 ◽  
Vol 22 (4) ◽  
pp. 347-377 ◽  
Author(s):  
Arlin J. Benjamin ◽  
Sven Kepes ◽  
Brad J. Bushman

A landmark 1967 study showed that simply seeing a gun can increase aggression—called the “weapons effect.” Since 1967, many other studies have attempted to replicate and explain the weapons effect. This meta-analysis integrates the findings of weapons effect studies conducted from 1967 to 2017 and uses the General Aggression Model (GAM) to explain the weapons effect. It includes 151 effect-size estimates from 78 independent studies involving 7,668 participants. As predicted by the GAM, our naïve meta-analytic results indicate that the mere presence of weapons increased aggressive thoughts, hostile appraisals, and aggression, suggesting a cognitive route from weapons to aggression. Weapons did not significantly increase angry feelings. Yet, a comprehensive sensitivity analysis indicated that not all naïve mean estimates were robust to the presence of publication bias. In general, these results suggest that the published literature tends to overestimate the weapons effect for some outcomes and moderators.


2020 ◽  
Author(s):  
Molly Lewis ◽  
Maya B Mathur ◽  
Tyler VanderWeele ◽  
Michael C. Frank

What is the best way to estimate the size of important effects? Should we aggregate across disparate findings using statistical meta-analysis, or instead run large, multi-lab replications (MLR)? A recent paper by Kvarven, Strømland, and Johannesson (2020) compared effect size estimates derived from these two different methods for 15 different psychological phenomena. The authors report that, for the same phenomenon, the meta-analytic estimate tends to be about three times larger than the MLR estimate. These results pose an important puzzle: What is the relationship between these two estimates? Kvarven et al. suggest that their results undermine the value of meta-analysis. In contrast, we argue that both meta-analysis and MLR are informative, and that the discrepancy between estimates obtained via the two methods is in fact still unexplained. Informed by re-analyses of Kvarven et al.’s data and by other empirical evidence, we discuss possible sources of this discrepancy and argue that understanding the relationship between estimates obtained from these two methods is an important puzzle for future meta-scientific research.


2018 ◽  
Author(s):  
Michele B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde Augusteijn ◽  
Elise Anne Victoire Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2,442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of .26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of in intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small study effects, potentially indicating publication bias and overestimated effects. We found no differences in small study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We conclude that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2020 ◽  
Vol 46 (2-3) ◽  
pp. 343-354 ◽  
Author(s):  
Timothy R Levine ◽  
René Weber

Abstract We examined the interplay between how communication researchers use meta-analyses to make claims and the prevalence, causes, and implications of unresolved heterogeneous findings. Heterogeneous findings can result from substantive moderators, methodological artifacts, and combined construct invalidity. An informal content analysis of meta-analyses published in four elite communication journals revealed that unresolved between-study effect heterogeneity was ubiquitous. Communication researchers mainly focus on computing mean effect sizes, to the exclusion of how effect sizes in primary studies are distributed and of what might be driving effect size distributions. We offer four recommendations for future meta-analyses. Researchers are advised to be more diligent and sophisticated in testing for heterogeneity. We encourage greater description of how effects are distributed, coupled with greater reliance on graphical displays. We council greater recognition of combined construct invalidity and advocate for content expertise. Finally, we endorse greater awareness and improved tests for publication bias and questionable research practices.


Author(s):  
John C. Norcross ◽  
Thomas P. Hogan ◽  
Gerald P. Koocher ◽  
Lauren A. Maggio

Assessing and interpreting research reports involves examination of individual studies as well as summaries of many studies. Summaries may be conveyed in narrative reviews or, more typically, in meta-analyses. This chapter reviews how researchers conduct a meta-analysis and report the results, especially by means of forest plots, which incorporate measures of effect size and their confidence intervals. A meta-analysis may also use moderator analyses or meta-regressions to identify important influences on the results. Critical appraisal of a study requires careful attention to the details of the sample used, the independent variable (treatment), dependent variable (outcome measure), the comparison groups, and the relation between the stated conclusions and the actual results. The CONSORT flow diagram provides a context for interpreting the sample and comparison groups. Finally, users must be alert to possible artifacts of publication bias.


2020 ◽  
Vol 8 (4) ◽  
pp. 36
Author(s):  
Michèle B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde E. M. Augusteijn ◽  
Elise A. V. Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


Sign in / Sign up

Export Citation Format

Share Document