scholarly journals New metrics for meta-analyses of heterogeneous effects

Author(s):  
Maya B Mathur ◽  
Tyler VanderWeele

We provide two simple metrics that could be reported routinely in random-effects meta-analyses to convey evidence strength for scientifically meaningful effects under effect heterogeneity (i.e., a nonzero estimated variance of the true effect distribution). First, given a chosen threshold of meaningful effect size, meta-analyses could report the estimated proportion of true effect sizes above this threshold. Second, meta-analyses could estimate the proportion of effect sizes below a second, possibly symmetric, threshold in the opposite direction from the estimated mean. These metrics could help identify if: (1) there are few effects of scientifically meaningful size despite a "statistically significant" pooled point estimate; (2) there are some large effects despite an apparently null point estimate; or (3) strong effects in the direction opposite the pooled estimate regularly also occur (and thus, potential effect modifiers should be examined). These metrics should be presented with confidence intervals, which can be obtained analytically or, under weaker assumptions, using bias-corrected and accelerated (BCa) bootstrapping. Additionally, these metrics inform relative comparison of evidence strength across related meta-analyses. We illustrate with applied examples and provide an R package to compute the metrics and confidence intervals.

2020 ◽  
Author(s):  
Maya B Mathur ◽  
Tyler VanderWeele

We recently suggested new statistical metrics for routine reporting in random-effects meta-analyses to convey evidence strength for scientifically meaningful effects under effect heterogeneity. First, given a chosen threshold of meaningful effect size, we suggested reporting the estimated proportion of true effect sizes above this threshold. Second, we suggested reporting the proportion of effect sizes below a second, possibly symmetric, threshold in the opposite direction from the estimated mean. Our previous methods applied when the true effects are approximately normal, when the number of studies is relatively large, and when the proportion is between approximately 0.15 and 0.85. Here, we additionally describe robust methods for point estimation and inference that perform well under considerably more general conditions, as we validate in an extensive simulation study. The methods are implemented in the R package MetaUtility (function prop_stronger). We describe application of the robust methods to conducting sensitivity analyses for unmeasured confounding in meta-analyses.


2019 ◽  
Author(s):  
Shinichi Nakagawa ◽  
Malgorzata Lagisz ◽  
Rose E O'Dea ◽  
Joanna Rutkowska ◽  
Yefeng Yang ◽  
...  

‘Classic’ forest plots show the effect sizes from individual studies and the aggregate effect from a meta-analysis. However, in ecology and evolution meta-analyses routinely contain over 100 effect sizes, making the classic forest plot of limited use. We surveyed 102 meta-analyses in ecology and evolution, finding that only 11% use the classic forest plot. Instead, most used a ‘forest-like plot’, showing point estimates (with 95% confidence intervals; CIs) from a series of subgroups or categories in a meta-regression. We propose a modification of the forest-like plot, which we name the ‘orchard plot’. Orchard plots, in addition to showing overall mean effects and CIs from meta-analyses/regressions, also includes 95% prediction intervals (PIs), and the individual effect sizes scaled by their precision. The PI allows the user and reader to see the range in which an effect size from a future study may be expected to fall. The PI, therefore, provides an intuitive interpretation of any heterogeneity in the data. Supplementing the PI, the inclusion of underlying effect sizes also allows the user to see any influential or outlying effect sizes. We showcase the orchard plot with example datasets from ecology and evolution, using the R package, orchard, including several functions for visualizing meta-analytic data using forest-plot derivatives. We consider the orchard plot as a variant on the classic forest plot, cultivated to the needs of meta-analysts in ecology and evolution. Hopefully, the orchard plot will prove fruitful for visualizing large collections of heterogeneous effect sizes regardless of the field of study.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3544 ◽  
Author(s):  
Valentin Amrhein ◽  
Fränzi Korner-Nievergelt ◽  
Tobias Roth

The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degradingp-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take smallp-values at face value, but mistrust results with largerp-values. In either case,p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging,p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher,p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also largerp-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of largerp-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or thatp-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.


2019 ◽  
Author(s):  
Maya B Mathur ◽  
Tyler VanderWeele

We propose sensitivity analyses for publication bias in meta-analyses. We consider a publication process such that "statistically significant" results are more likely to be published than negative or "nonsignificant" results by an unknown ratio, eta. Our proposed methods also accommodate some plausible forms of selection based on a study's standard error. Using inverse-probability weighting and robust estimation that accommodates non-normal population effects, small meta-analyses, and clustering, we develop sensitivity analyses that enable statements such as: "For publication bias to shift the observed point estimate to the null, 'significant' results would need to be at least 30-fold more likely to be published than negative or 'nonsignificant' results." Comparable statements can be made regarding shifting to a chosen non-null value or shifting the confidence interval. To aid interpretation, we describe empirical benchmarks for plausible values of eta across disciplines. We show that a worst-case meta-analytic point estimate for maximal publication bias under the selection model can be obtained simply by conducting a standard meta-analysis of only the negative and "nonsignificant" studies; this method sometimes indicates that no amount of such publication bias could "explain away" the results. We illustrate the proposed methods using real-life meta-analyses and provide an R package, PublicationBias.


2020 ◽  
Author(s):  
Harriet L Mills ◽  
Julian PT Higgins ◽  
Richard W Morris ◽  
David Kessler ◽  
Jon Heron ◽  
...  

AbstractRandomised controlled trials (RCTs) with continuous outcomes usually only examine mean differences in response between trial arms. If the intervention has heterogeneous effects (e.g. the effect of the intervention differs by individual characteristics), then outcome variances will also differ between arms. However, power of an individual trial to assess heterogeneity is lower than the power to detect the same size of main effect. The aim of this work was to describe and implement methods for examining heterogeneity of effects of interventions, in trials with individual patient data (IPD) and also in meta-analyses using summary data. Several methods for assessing differences in variance were applied using IPD from a single trial, and summary data from two meta-analyses.In the single trial there was agreement between methods, and the difference in variance was largely due to differences in depression at baseline. In two meta-analyses, most individual trials did not show strong evidence of a difference in variance between arms, with wide confidence intervals. However, both meta-analyses showed evidence of greater variance in the control arm, and in one example this was perhaps because mean outcome in the control arm was higher.Low power of individual trials to examine differences in variance can be overcome using meta-analysis. Evidence of differences in variance should be followed-up to identify potential effect modifiers and explore other possible causes such as varying compliance.


2020 ◽  
Author(s):  
Anton Olsson-Collentine ◽  
Marcel A. L. M. van Assen ◽  
Jelte M. Wicherts

We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from thirteen pre-registered multi-lab direct replication projects in social and cognitive psychology. Amongst the many examined effects, examples include the Stroop effect, the “verbal overshadowing” effect, and various priming effects such as “anchoring” effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had non-significant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a zero average true effect size, but increasingly likely for larger average true effect size.


2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert

More and more scientific research gets published nowadays, asking for statistical methods that enable researchers to get an overview of the literature in a particular research field. For that purpose, meta-analysis methods were developed that can be used for statistically combining the effect sizes from independent primary studies on the same topic. My dissertation focuses on two issues that are crucial when conducting a meta-analysis: publication bias and heterogeneity in primary studies’ true effect sizes. Accurate estimation of both the meta-analytic effect size as well as the between-study variance in true effect size is crucial since the results of meta-analyses are often used for policy making. Publication bias distorts the results of a meta-analysis since it refers to situations where publication of a primary study depends on its results. We developed new meta-analysis methods, p-uniform and p-uniform*, which estimate effect sizes corrected for publication bias and also test for publication bias. Although the methods perform well in many conditions, these and the other existing methods are shown not to perform well when researchers use questionable research practices. Additionally, when publication bias is absent or limited, traditional methods that do not correct for publication bias outperform p¬-uniform and p-uniform*. Surprisingly, we found no strong evidence for the presence of publication bias in our pre-registered study on the presence of publication bias in a large-scale data set consisting of 83 meta-analyses and 499 systematic reviews published in the fields of psychology and medicine. We also developed two methods for meta-analyzing a statistically significant published original study and a replication of that study, which reflects a situation often encountered by researchers. One method is a frequentist whereas the other method is a Bayesian statistical method. Both methods are shown to perform better than traditional meta-analytic methods that do not take the statistical significance of the original study into account. Analytical studies of both methods also show that sometimes the original study is better discarded for optimal estimation of the true effect size. Finally, we developed a program for determining the required sample size in a replication analogous to power analysis in null hypothesis testing. Computing the required sample size with the method revealed that large sample sizes (approximately 650 participants) are required to be able to distinguish a zero from a small true effect.Finally, in the last two chapters we derived a new multi-step estimator for the between-study variance in primary studies’ true effect sizes, and examined the statistical properties of two methods (Q-profile and generalized Q-statistic method) to compute the confidence interval of the between-study variance in true effect size. We proved that the multi-step estimator converges to the Paule-Mandel estimator which is nowadays one of the recommended methods to estimate the between-study variance in true effect sizes. Two Monte-Carlo simulation studies showed that the coverage probabilities of Q-profile and generalized Q-statistic method can be substantially below the nominal coverage rate if the assumptions underlying the random-effects meta-analysis model were violated.


2016 ◽  
Vol 42 (2) ◽  
pp. 206-242 ◽  
Author(s):  
Joshua R. Polanin ◽  
Emily A. Hennessy ◽  
Emily E. Tanner-Smith

Meta-analysis is a statistical technique that allows an analyst to synthesize effect sizes from multiple primary studies. To estimate meta-analysis models, the open-source statistical environment R is quickly becoming a popular choice. The meta-analytic community has contributed to this growth by developing numerous packages specific to meta-analysis. The purpose of this study is to locate all publicly available meta-analytic R packages. We located 63 packages via a comprehensive online search. To help elucidate these functionalities to the field, we describe each of the packages, recommend applications for researchers interested in using R for meta-analyses, provide a brief tutorial of two meta-analysis packages, and make suggestions for future meta-analytic R package creators.


Sign in / Sign up

Export Citation Format

Share Document