scholarly journals Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

2021 ◽  
Author(s):  
Anton Olsson-Collentine ◽  
Robbie Cornelis Maria van Aert ◽  
Marjan Bakker ◽  
Jelte M. Wicherts

There are arbitrary decisions to be made (i.e., researcher degrees of freedom) in the execution and reporting of most research. These decisions allow for many possible outcomes from a single study. Selective reporting of results from this ‘multiverse’ of outcomes, whether intentional (_p_-hacking) or not, can lead to inflated effect size estimates and false positive results in the literature. In this study, we examine and illustrate the consequences of researcher degrees of freedom in primary research, both for primary outcomes and for subsequent meta-analyses. We used a set of 10 preregistered multi-lab direct replication projects from psychology (Registered Replication Reports) with a total of 14 primary outcome variables, 236 labs and 37,602 participants. By exploiting researcher degrees of freedom in each project, we were able to compute between 3,840 and 2,621,440 outcomes per lab. We show that researcher degrees of freedom in primary research can cause substantial variability in effect size that we denote the Underlying Multiverse Variability (UMV). In our data, the median UMV across labs was 0.1 standard deviations (interquartile range = 0.09 – 0.15). In one extreme case, the effect size estimate could change by _d_ = 1.27, evidence that _p_-hacking in some (rare) cases can provide support for almost any conclusion. We also show that researcher degrees of freedom in primary research provide another source of uncertainty in meta-analysis beyond those usually estimated. This would not be a large concern for meta-analysis if researchers made all arbitrary decisions at random. However, emulating selective reporting of lab results led to inflation of meta-analytic average effect size estimates in our data by as much as 0.1 - 0.48 standard deviations, depending to a large degree on the number of possible outcomes at the lab level (i.e., multiverse size). Our results illustrate the importance of making research decisions transparent (e.g., through preregistration and multiverse analysis), evaluating studies for selective reporting, and whenever feasible making raw data available.

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


2016 ◽  
Vol 46 (11) ◽  
pp. 2287-2297 ◽  
Author(s):  
A. F. Carvalho ◽  
C. A. Köhler ◽  
B. S. Fernandes ◽  
J. Quevedo ◽  
K. W. Miskowiak ◽  
...  

BackgroundTo date no comprehensive evaluation has appraised the likelihood of bias or the strength of the evidence of peripheral biomarkers for bipolar disorder (BD). Here we performed an umbrella review of meta-analyses of peripheral non-genetic biomarkers for BD.MethodThe Pubmed/Medline, EMBASE and PsycInfo electronic databases were searched up to May 2015. Two independent authors conducted searches, examined references for eligibility, and extracted data. Meta-analyses in any language examining peripheral non-genetic biomarkers in participants with BD (across different mood states) compared to unaffected controls were included.ResultsSix references, which examined 13 biomarkers across 20 meta-analyses (5474 BD cases and 4823 healthy controls) met inclusion criteria. Evidence for excess of significance bias (i.e. bias favoring publication of ‘positive’ nominally significant results) was observed in 11 meta-analyses. Heterogeneity was high for (I2 ⩾ 50%) 16 meta-analyses. Only two biomarkers met criteria for suggestive evidence namely the soluble IL-2 receptor and morning cortisol. The median power of included studies, using the effect size of the largest dataset as the plausible true effect size of each meta-analysis, was 15.3%.ConclusionsOur findings suggest that there is an excess of statistically significant results in the literature of peripheral biomarkers for BD. Selective publication of ‘positive’ results and selective reporting of outcomes are possible mechanisms.


2021 ◽  
Vol 44 ◽  
Author(s):  
Robert M. Ross ◽  
Robbie C. M. van Aert ◽  
Olmo R. van den Akker ◽  
Michiel van Elk

Abstract Lee and Schwarz interpret meta-analytic research and replication studies as providing evidence for the robustness of cleansing effects. We argue that the currently available evidence is unconvincing because (a) publication bias and the opportunistic use of researcher degrees of freedom appear to have inflated meta-analytic effect size estimates, and (b) preregistered replications failed to find any evidence of cleansing effects.


2020 ◽  
Author(s):  
Molly Lewis ◽  
Maya B Mathur ◽  
Tyler VanderWeele ◽  
Michael C. Frank

What is the best way to estimate the size of important effects? Should we aggregate across disparate findings using statistical meta-analysis, or instead run large, multi-lab replications (MLR)? A recent paper by Kvarven, Strømland, and Johannesson (2020) compared effect size estimates derived from these two different methods for 15 different psychological phenomena. The authors report that, for the same phenomenon, the meta-analytic estimate tends to be about three times larger than the MLR estimate. These results pose an important puzzle: What is the relationship between these two estimates? Kvarven et al. suggest that their results undermine the value of meta-analysis. In contrast, we argue that both meta-analysis and MLR are informative, and that the discrepancy between estimates obtained via the two methods is in fact still unexplained. Informed by re-analyses of Kvarven et al.’s data and by other empirical evidence, we discuss possible sources of this discrepancy and argue that understanding the relationship between estimates obtained from these two methods is an important puzzle for future meta-scientific research.


2021 ◽  
Author(s):  
Robert M Ross ◽  
Robbie Cornelis Maria van Aert ◽  
Olmo Van den Akker ◽  
Michiel van Elk

Lee and Schwarz interpret meta-analytic research and replication studies as providing evidence for the robustness of cleansing effects. We argue that the currently available evidence is unconvincing because (a) publication bias and the opportunistic use of researcher degrees of freedom appear to have inflated meta-analytic effect size estimates, and (b) preregistered replications failed to find any evidence of cleansing effects.


2020 ◽  
Author(s):  
Malte Friese ◽  
Julius Frankenbach

Science depends on trustworthy evidence. Thus, a biased scientific record is of questionable value because it impedes scientific progress, and the public receives advice on the basis of unreliable evidence that has the potential to have far-reaching detrimental consequences. Meta-analysis is a valid and reliable technique that can be used to summarize research evidence. However, meta-analytic effect size estimates may themselves be biased, threatening the validity and usefulness of meta-analyses to promote scientific progress. Here, we offer a large-scale simulation study to elucidate how p-hacking and publication bias distort meta-analytic effect size estimates under a broad array of circumstances that reflect the reality that exists across a variety of research areas. The results revealed that, first, very high levels of publication bias can severely distort the cumulative evidence. Second, p-hacking and publication bias interact: At relatively high and low levels of publication bias, p-hacking does comparatively little harm, but at medium levels of publication bias, p-hacking can considerably contribute to bias, especially when the true effects are very small or are approaching zero. Third, p-hacking can severely increase the rate of false positives. A key implication is that, in addition to preventing p-hacking, policies in research institutions, funding agencies, and scientific journals need to make the prevention of publication bias a top priority to ensure a trustworthy base of evidence.


2011 ◽  
Vol 25 (11) ◽  
pp. 1573-1577 ◽  
Author(s):  
Eleanor M Taylor ◽  
Natasha MP Greene ◽  
Celia JA Morgan ◽  
Marcus R Munafò

Studies of the chronic effects of MDMA, or ‘ecstasy’, in humans have been largely inconsistent. We explored whether study-level characteristics are associated with the effect size estimate reported. We based our analyses on the recent systematic review by Rogers and colleagues, focusing on those meta-analyses within this report where there was a relatively large number of studies contributing to each individual meta-analysis. Linear regression was used to investigate the association between study level variables and effect size estimate, weighted by the inverse of the SE of the effect size estimate, with cluster correction for studies which contributed multiple estimates. This indicated an association between effect size estimate and both user group, with smaller estimates among studies recruiting former users compared with those recruiting current users, and control group, with smaller estimates among studies recruiting polydrug user controls compared with those recruiting drug-naïve controls. In addition, increasing year of publication was associated with reduced effect size estimate, and there was a trend level association with prevalence of ecstasy use, reflecting smaller estimates among studies conducted in countries with higher prevalence of ecstasy use. Our data suggest a number of study-level characteristics which appear to influence individual study effect size estimates. These should be considered when designing future studies, and also when interpreting the ecstasy literature as a whole.


2020 ◽  
Vol 228 (1) ◽  
pp. 43-49 ◽  
Author(s):  
Michael Kossmeier ◽  
Ulrich S. Tran ◽  
Martin Voracek

Abstract. Currently, dedicated graphical displays to depict study-level statistical power in the context of meta-analysis are unavailable. Here, we introduce the sunset (power-enhanced) funnel plot to visualize this relevant information for assessing the credibility, or evidential value, of a set of studies. The sunset funnel plot highlights the statistical power of primary studies to detect an underlying true effect of interest in the well-known funnel display with color-coded power regions and a second power axis. This graphical display allows meta-analysts to incorporate power considerations into classic funnel plot assessments of small-study effects. Nominally significant, but low-powered, studies might be seen as less credible and as more likely being affected by selective reporting. We exemplify the application of the sunset funnel plot with two published meta-analyses from medicine and psychology. Software to create this variation of the funnel plot is provided via a tailored R function. In conclusion, the sunset (power-enhanced) funnel plot is a novel and useful graphical display to critically examine and to present study-level power in the context of meta-analysis.


2021 ◽  
pp. 152483802110216
Author(s):  
Brooke N. Lombardi ◽  
Todd M. Jensen ◽  
Anna B. Parisi ◽  
Melissa Jenkins ◽  
Sarah E. Bledsoe

Background: The association between a lifetime history of sexual victimization and the well-being of women during the perinatal period has received increasing attention. However, research investigating this relationship has yet to be systematically reviewed or quantitatively synthesized. Aim: This systematic review and meta-analysis aims to calculate the pooled effect size estimate of the statistical association between a lifetime history of sexual victimization and perinatal depression (PND). Method: Four bibliographic databases were systematically searched, and reference harvesting was conducted to identify peer-reviewed articles that empirically examined associations between a lifetime history of sexual victimization and PND. A random effects model was used to ascertain an overall pooled effect size estimate in the form of an odds ratio and corresponding 95% confidence intervals (CIs). Subgroup analyses were also conducted to assess whether particular study features and sample characteristic (e.g., race and ethnicity) influenced the magnitude of effect size estimates. Results: This review included 36 studies, with 45 effect size estimates available for meta-analysis. Women with a lifetime history of sexual victimization had 51% greater odds of experiencing PND relative to women with no history of sexual victimization ( OR = 1.51, 95% CI [1.35, 1.67]). Effect size estimates varied considerably according to the PND instrument used in each study and the racial/ethnic composition of each sample. Conclusion: Findings provide compelling evidence for an association between a lifetime history of sexual victimization and PND. Future research should focus on screening practices and interventions that identify and support survivors of sexual victimization perinatally.


2021 ◽  
pp. 146531252110272
Author(s):  
Despina Koletsi ◽  
Anna Iliadi ◽  
Theodore Eliades

Objective: To evaluate all available evidence on the prediction of rotational tooth movements with aligners. Data sources: Seven databases of published and unpublished literature were searched up to 4 August 2020 for eligible studies. Data selection: Studies were deemed eligible if they included evaluation of rotational tooth movement with any type of aligner, through the comparison of software-based and actually achieved data after patient treatment. Data extraction and data synthesis: Data extraction was done independently and in duplicate and risk of bias assessment was performed with the use of the QUADAS-2 tool. Random effects meta-analyses with effect sizes and their 95% confidence intervals (CIs) were performed and the quality of the evidence was assessed through GRADE. Results: Seven articles were included in the qualitative synthesis, of which three contributed to meta-analyses. Overall results revealed a non-accurate prediction of the outcome for the software-based data, irrespective of the use of attachments or interproximal enamel reduction (IPR). Maxillary canines demonstrated the lowest percentage accuracy for rotational tooth movement (three studies: effect size = 47.9%; 95% CI = 27.2–69.5; P < 0.001), although high levels of heterogeneity were identified (I2: 86.9%; P < 0.001). Contrary, mandibular incisors presented the highest percentage accuracy for predicted rotational movement (two studies: effect size = 70.7%; 95% CI = 58.9–82.5; P < 0.001; I2: 0.0%; P = 0.48). Risk of bias was unclear to low overall, while quality of the evidence ranged from low to moderate. Conclusion: Allowing for all identified caveats, prediction of rotational tooth movements with aligner treatment does not appear accurate, especially for canines. Careful selection of patients and malocclusions for aligner treatment decisions remain challenging.


Sign in / Sign up

Export Citation Format

Share Document