scholarly journals Comparing Meta-Analyses and Pre-Registered Multiple Labs Replication Projects

Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Many researchers rely on meta-analysis to summarize research evidence. However, recent replication projects in the behavioral sciences suggest that effect sizes of original studies are overestimated, and this overestimation is typically attributed to publication bias and selective reporting of scientific results. As the validity of meta-analyses depends on the primary studies, there is a concern that systematic overestimation of effect sizes may translate into biased meta-analytic effect sizes. We compare the results of meta-analyses to large-scale pre-registered replications in psychology carried out at multiple labs. The multiple labs replications provide relatively precisely estimated effect sizes, which do not suffer from publication bias or selective reporting. Searching the literature, 17 meta-analyses – spanning more than 1,200 effect sizes and more than 370,000 participants - on the same topics as multiple labs replications are identified. We find that the meta-analytic effect sizes are significantly different from the replication effect sizes for 12 out of the 17 meta-replication pairs. These differences are systematic and on average meta-analytic effect sizes are about three times as large as the replication effect sizes.

2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


2015 ◽  
Vol 24 (2) ◽  
pp. 237-255 ◽  
Author(s):  
Patricia L. Cleave ◽  
Stephanie D. Becker ◽  
Maura K. Curran ◽  
Amanda J. Owen Van Horne ◽  
Marc E. Fey

Purpose This systematic review and meta-analysis critically evaluated the research evidence on the effectiveness of conversational recasts in grammatical development for children with language impairments. Method Two different but complementary reviews were conducted and then integrated. Systematic searches of the literature resulted in 35 articles for the systematic review. Studies that employed a wide variety of study designs were involved, but all examined interventions where recasts were the key component. The meta-analysis only included studies that allowed the calculation of effect sizes, but it did include package interventions in which recasts were a major part. Fourteen studies were included, 7 of which were also in the systematic review. Studies were grouped according to research phase and were rated for quality. Results Study quality and thus strength of evidence varied substantially. Nevertheless, across all phases, the vast majority of studies provided support for the use of recasts. Meta-analyses found average effect sizes of .96 for proximal measures and .76 for distal measures, reflecting a positive benefit of about 0.75 to 1.00 standard deviation. Conclusion The available evidence is limited, but it is supportive of the use of recasts in grammatical intervention. Critical features of recasts in grammatical interventions are discussed.


2017 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Jelte M. Wicherts ◽  
Marcel A. L. M. van Assen

Publication bias is a substantial problem for the credibility of research in general and of meta-analyses in particular, as it yields overestimated effects and may suggest the existence of non-existing effects. Although there is consensus that publication bias exists, how strongly it affects different scientific literatures is currently less well-known. We examined evidence of publication bias in a large-scale data set of primary studies that were included in 83 meta-analyses published in Psychological Bulletin (representing meta-analyses from psychology) and 499 systematic reviews from the Cochrane Database of Systematic Reviews (CDSR; representing meta-analyses from medicine). Publication bias was assessed on all homogeneous subsets (3.8% of all subsets of meta-analyses published in Psychological Bulletin) of primary studies included in meta-analyses, because publication bias methods do not have good statistical properties if the true effect size is heterogeneous. The Monte-Carlo simulation study revealed that the creation of homogeneous subsets resulted in challenging conditions for publication bias methods since the number of effect sizes in a subset was rather small (median number of effect sizes equaled 6). No evidence of bias was obtained using the publication bias tests. Overestimation was minimal but statistically significant, providing evidence of publication bias that appeared to be similar in both fields. These and other findings, in combination with the small percentages of statistically significant primary effect sizes (28.9% and 18.9% for subsets published in Psychological Bulletin and CDSR), led to the conclusion that evidence for publication bias in the studied homogeneous subsets is weak, but suggestive of mild publication bias in both psychology and medicine.


2020 ◽  
Vol 25 (1) ◽  
pp. 51-72 ◽  
Author(s):  
Christian Franz Josef Woll ◽  
Felix D. Schönbrodt

Abstract. Recent meta-analyses come to conflicting conclusions about the efficacy of long-term psychoanalytic psychotherapy (LTPP). Our first goal was to reproduce the most recent meta-analysis by Leichsenring, Abbass, Luyten, Hilsenroth, and Rabung (2013) who found evidence for the efficacy of LTPP in the treatment of complex mental disorders. Our replicated effect sizes were in general slightly smaller. Second, we conducted an updated meta-analysis of randomized controlled trials comparing LTPP (lasting for at least 1 year and 40 sessions) to other forms of psychotherapy in the treatment of complex mental disorders. We focused on a transparent research process according to open science standards and applied a series of elaborated meta-analytic procedures to test and control for publication bias. Our updated meta-analysis comprising 191 effect sizes from 14 eligible studies revealed small, statistically significant effect sizes at post-treatment for the outcome domains psychiatric symptoms, target problems, social functioning, and overall effectiveness (Hedges’ g ranging between 0.24 and 0.35). The effect size for the domain personality functioning (0.24) was not significant ( p = .08). No signs for publication bias could be detected. In light of a heterogeneous study set and some methodological shortcomings in the primary studies, these results should be interpreted cautiously. In conclusion, LTPP might be superior to other forms of psychotherapy in the treatment of complex mental disorders. Notably, our effect sizes represent the additional gain of LTPP versus other forms of primarily long-term psychotherapy. In this case, large differences in effect sizes are not to be expected.


2018 ◽  
Author(s):  
Christian Franz Josef Woll ◽  
Felix D. Schönbrodt

Recent meta-analyses come to conflicting conclusions about the efficacy of long-term psychoanalytic psychotherapy (LTPP). Our first goal was to reproduce the most recent meta-analysis by Leichsenring, Abbass, Luyten, Hilsenroth, and Rabung (2013) who found evidence for the efficacy of LTPP in the treatment of complex mental disorders. Our replicated effect sizes were in general slightly smaller. Second, we conducted an updated meta-analysis of randomized controlled trials comparing LTPP (lasting for at least one year and 40 sessions) to other forms of psychotherapy in the treatment of complex mental disorders. We focused on a transparent research process according to open science standards and applied a series of elaborated meta-analytic procedures to test and control for publication bias. Our updated meta-analysis comprising 191 effect sizes from 14 eligible studies revealed small, statistically significant effect sizes at post-treatment for the outcome domains psychiatric symptoms, target problems, social functioning, and overall effectiveness (Hedges’ g ranging between 0.24 and 0.35). The effect size for the domain personality functioning (0.24) was not significant (p = .08). No signs for publication bias could be detected. In light of a heterogeneous study set and some methodological shortcomings in the primary studies, these results should be interpreted cautiously. In conclusion, LTPP might be superior to other forms of psychotherapy in the treatment of complex mental disorders. Notably, our effect sizes represent the additional gain of LTPP vs. other forms of primarily long-term psychotherapy. In this case, large differences in effect sizes are not to be expected.


2018 ◽  
Author(s):  
Michele B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde Augusteijn ◽  
Elise Anne Victoire Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2,442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of .26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of in intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small study effects, potentially indicating publication bias and overestimated effects. We found no differences in small study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We conclude that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2020 ◽  
Vol 46 (2-3) ◽  
pp. 343-354 ◽  
Author(s):  
Timothy R Levine ◽  
René Weber

Abstract We examined the interplay between how communication researchers use meta-analyses to make claims and the prevalence, causes, and implications of unresolved heterogeneous findings. Heterogeneous findings can result from substantive moderators, methodological artifacts, and combined construct invalidity. An informal content analysis of meta-analyses published in four elite communication journals revealed that unresolved between-study effect heterogeneity was ubiquitous. Communication researchers mainly focus on computing mean effect sizes, to the exclusion of how effect sizes in primary studies are distributed and of what might be driving effect size distributions. We offer four recommendations for future meta-analyses. Researchers are advised to be more diligent and sophisticated in testing for heterogeneity. We encourage greater description of how effects are distributed, coupled with greater reliance on graphical displays. We council greater recognition of combined construct invalidity and advocate for content expertise. Finally, we endorse greater awareness and improved tests for publication bias and questionable research practices.


2020 ◽  
Vol 8 (4) ◽  
pp. 36
Author(s):  
Michèle B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde E. M. Augusteijn ◽  
Elise A. V. Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert

More and more scientific research gets published nowadays, asking for statistical methods that enable researchers to get an overview of the literature in a particular research field. For that purpose, meta-analysis methods were developed that can be used for statistically combining the effect sizes from independent primary studies on the same topic. My dissertation focuses on two issues that are crucial when conducting a meta-analysis: publication bias and heterogeneity in primary studies’ true effect sizes. Accurate estimation of both the meta-analytic effect size as well as the between-study variance in true effect size is crucial since the results of meta-analyses are often used for policy making. Publication bias distorts the results of a meta-analysis since it refers to situations where publication of a primary study depends on its results. We developed new meta-analysis methods, p-uniform and p-uniform*, which estimate effect sizes corrected for publication bias and also test for publication bias. Although the methods perform well in many conditions, these and the other existing methods are shown not to perform well when researchers use questionable research practices. Additionally, when publication bias is absent or limited, traditional methods that do not correct for publication bias outperform p¬-uniform and p-uniform*. Surprisingly, we found no strong evidence for the presence of publication bias in our pre-registered study on the presence of publication bias in a large-scale data set consisting of 83 meta-analyses and 499 systematic reviews published in the fields of psychology and medicine. We also developed two methods for meta-analyzing a statistically significant published original study and a replication of that study, which reflects a situation often encountered by researchers. One method is a frequentist whereas the other method is a Bayesian statistical method. Both methods are shown to perform better than traditional meta-analytic methods that do not take the statistical significance of the original study into account. Analytical studies of both methods also show that sometimes the original study is better discarded for optimal estimation of the true effect size. Finally, we developed a program for determining the required sample size in a replication analogous to power analysis in null hypothesis testing. Computing the required sample size with the method revealed that large sample sizes (approximately 650 participants) are required to be able to distinguish a zero from a small true effect.Finally, in the last two chapters we derived a new multi-step estimator for the between-study variance in primary studies’ true effect sizes, and examined the statistical properties of two methods (Q-profile and generalized Q-statistic method) to compute the confidence interval of the between-study variance in true effect size. We proved that the multi-step estimator converges to the Paule-Mandel estimator which is nowadays one of the recommended methods to estimate the between-study variance in true effect sizes. Two Monte-Carlo simulation studies showed that the coverage probabilities of Q-profile and generalized Q-statistic method can be substantially below the nominal coverage rate if the assumptions underlying the random-effects meta-analysis model were violated.


2020 ◽  
Author(s):  
Malte Friese ◽  
Julius Frankenbach

Science depends on trustworthy evidence. Thus, a biased scientific record is of questionable value because it impedes scientific progress, and the public receives advice on the basis of unreliable evidence that has the potential to have far-reaching detrimental consequences. Meta-analysis is a valid and reliable technique that can be used to summarize research evidence. However, meta-analytic effect size estimates may themselves be biased, threatening the validity and usefulness of meta-analyses to promote scientific progress. Here, we offer a large-scale simulation study to elucidate how p-hacking and publication bias distort meta-analytic effect size estimates under a broad array of circumstances that reflect the reality that exists across a variety of research areas. The results revealed that, first, very high levels of publication bias can severely distort the cumulative evidence. Second, p-hacking and publication bias interact: At relatively high and low levels of publication bias, p-hacking does comparatively little harm, but at medium levels of publication bias, p-hacking can considerably contribute to bias, especially when the true effects are very small or are approaching zero. Third, p-hacking can severely increase the rate of false positives. A key implication is that, in addition to preventing p-hacking, policies in research institutions, funding agencies, and scientific journals need to make the prevention of publication bias a top priority to ensure a trustworthy base of evidence.


Sign in / Sign up

Export Citation Format

Share Document