Unresolved Heterogeneity in Meta-Analysis: Combined Construct Invalidity, Confounding, and Other Challenges to Understanding Mean Effect Sizes

2020 ◽  
Vol 46 (2-3) ◽  
pp. 343-354 ◽  
Author(s):  
Timothy R Levine ◽  
René Weber

Abstract We examined the interplay between how communication researchers use meta-analyses to make claims and the prevalence, causes, and implications of unresolved heterogeneous findings. Heterogeneous findings can result from substantive moderators, methodological artifacts, and combined construct invalidity. An informal content analysis of meta-analyses published in four elite communication journals revealed that unresolved between-study effect heterogeneity was ubiquitous. Communication researchers mainly focus on computing mean effect sizes, to the exclusion of how effect sizes in primary studies are distributed and of what might be driving effect size distributions. We offer four recommendations for future meta-analyses. Researchers are advised to be more diligent and sophisticated in testing for heterogeneity. We encourage greater description of how effects are distributed, coupled with greater reliance on graphical displays. We council greater recognition of combined construct invalidity and advocate for content expertise. Finally, we endorse greater awareness and improved tests for publication bias and questionable research practices.

2018 ◽  
Author(s):  
Michele B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde Augusteijn ◽  
Elise Anne Victoire Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2,442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of .26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of in intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small study effects, potentially indicating publication bias and overestimated effects. We found no differences in small study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We conclude that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2020 ◽  
Vol 8 (4) ◽  
pp. 36
Author(s):  
Michèle B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde E. M. Augusteijn ◽  
Elise A. V. Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert

More and more scientific research gets published nowadays, asking for statistical methods that enable researchers to get an overview of the literature in a particular research field. For that purpose, meta-analysis methods were developed that can be used for statistically combining the effect sizes from independent primary studies on the same topic. My dissertation focuses on two issues that are crucial when conducting a meta-analysis: publication bias and heterogeneity in primary studies’ true effect sizes. Accurate estimation of both the meta-analytic effect size as well as the between-study variance in true effect size is crucial since the results of meta-analyses are often used for policy making. Publication bias distorts the results of a meta-analysis since it refers to situations where publication of a primary study depends on its results. We developed new meta-analysis methods, p-uniform and p-uniform*, which estimate effect sizes corrected for publication bias and also test for publication bias. Although the methods perform well in many conditions, these and the other existing methods are shown not to perform well when researchers use questionable research practices. Additionally, when publication bias is absent or limited, traditional methods that do not correct for publication bias outperform p¬-uniform and p-uniform*. Surprisingly, we found no strong evidence for the presence of publication bias in our pre-registered study on the presence of publication bias in a large-scale data set consisting of 83 meta-analyses and 499 systematic reviews published in the fields of psychology and medicine. We also developed two methods for meta-analyzing a statistically significant published original study and a replication of that study, which reflects a situation often encountered by researchers. One method is a frequentist whereas the other method is a Bayesian statistical method. Both methods are shown to perform better than traditional meta-analytic methods that do not take the statistical significance of the original study into account. Analytical studies of both methods also show that sometimes the original study is better discarded for optimal estimation of the true effect size. Finally, we developed a program for determining the required sample size in a replication analogous to power analysis in null hypothesis testing. Computing the required sample size with the method revealed that large sample sizes (approximately 650 participants) are required to be able to distinguish a zero from a small true effect.Finally, in the last two chapters we derived a new multi-step estimator for the between-study variance in primary studies’ true effect sizes, and examined the statistical properties of two methods (Q-profile and generalized Q-statistic method) to compute the confidence interval of the between-study variance in true effect size. We proved that the multi-step estimator converges to the Paule-Mandel estimator which is nowadays one of the recommended methods to estimate the between-study variance in true effect sizes. Two Monte-Carlo simulation studies showed that the coverage probabilities of Q-profile and generalized Q-statistic method can be substantially below the nominal coverage rate if the assumptions underlying the random-effects meta-analysis model were violated.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


1990 ◽  
Vol 24 (3) ◽  
pp. 405-415 ◽  
Author(s):  
Nathaniel McConaghy

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.


2020 ◽  
Vol 25 (1) ◽  
pp. 51-72 ◽  
Author(s):  
Christian Franz Josef Woll ◽  
Felix D. Schönbrodt

Abstract. Recent meta-analyses come to conflicting conclusions about the efficacy of long-term psychoanalytic psychotherapy (LTPP). Our first goal was to reproduce the most recent meta-analysis by Leichsenring, Abbass, Luyten, Hilsenroth, and Rabung (2013) who found evidence for the efficacy of LTPP in the treatment of complex mental disorders. Our replicated effect sizes were in general slightly smaller. Second, we conducted an updated meta-analysis of randomized controlled trials comparing LTPP (lasting for at least 1 year and 40 sessions) to other forms of psychotherapy in the treatment of complex mental disorders. We focused on a transparent research process according to open science standards and applied a series of elaborated meta-analytic procedures to test and control for publication bias. Our updated meta-analysis comprising 191 effect sizes from 14 eligible studies revealed small, statistically significant effect sizes at post-treatment for the outcome domains psychiatric symptoms, target problems, social functioning, and overall effectiveness (Hedges’ g ranging between 0.24 and 0.35). The effect size for the domain personality functioning (0.24) was not significant ( p = .08). No signs for publication bias could be detected. In light of a heterogeneous study set and some methodological shortcomings in the primary studies, these results should be interpreted cautiously. In conclusion, LTPP might be superior to other forms of psychotherapy in the treatment of complex mental disorders. Notably, our effect sizes represent the additional gain of LTPP versus other forms of primarily long-term psychotherapy. In this case, large differences in effect sizes are not to be expected.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hanna Suh ◽  
Jisun Jeong

Objectives: Self-compassion functions as a psychological buffer in the face of negative life experiences. Considering that suicidal thoughts and behaviors (STBs) and non-suicidal self-injury (NSSI) are often accompanied by intense negative feelings about the self (e.g., self-loathing, self-isolation), self-compassion may have the potential to alleviate these negative attitudes and feelings toward oneself. This meta-analysis investigated the associations of self-compassion with STBs and NSSI.Methods: A literature search finalized in August 2020 identified 18 eligible studies (13 STB effect sizes and seven NSSI effect sizes), including 8,058 participants. Two studies were longitudinal studies, and the remainder were cross-sectional studies. A random-effects meta-analysis was conducted using CMA 3.0. Subgroup analyses, meta-regression, and publication bias analyses were conducted to probe potential sources of heterogeneity.Results: With regard to STBs, a moderate effect size was found for self-compassion (r = −0.34, k = 13). Positively worded subscales exhibited statistically significant effect sizes: self-kindness (r = −0.21, k = 4), common humanity (r = −0.20, k = 4), and mindfulness (r = −0.15, k = 4). For NSSI, a small effect size was found for self-compassion (r = −0.29, k = 7). There was a large heterogeneity (I2 = 80.92% for STBs, I2 = 86.25% for NSSI), and publication bias was minimal. Subgroup analysis results showed that sample characteristic was a moderator, such that a larger effect size was witnessed in clinical patients than sexually/racially marginalized individuals, college students, and healthy-functioning community adolescents.Conclusions: Self-compassion was negatively associated with STBs and NSSI, and the effect size of self-compassion was larger for STBs than NSSI. More evidence is necessary to gauge a clinically significant protective role that self-compassion may play by soliciting results from future longitudinal studies or intervention studies.


2012 ◽  
Vol 82 (3) ◽  
pp. 300-329 ◽  
Author(s):  
Erin Marie Furtak ◽  
Tina Seidel ◽  
Heidi Iverson ◽  
Derek C. Briggs

Although previous meta-analyses have indicated a connection between inquiry-based teaching and improved student learning, the type of instruction characterized as inquiry based has varied greatly, and few have focused on the extent to which activities are led by the teacher or student. This meta-analysis introduces a framework for inquiry-based teaching that distinguishes between cognitive features of the activity and degree of guidance given to students. This framework is used to code 37 experimental and quasi-experimental studies published between 1996 and 2006, a decade during which inquiry was the main focus of science education reform. The overall mean effect size is .50. Studies that contrasted epistemic activities or the combination of procedural, epistemic, and social activities had the highest mean effect sizes. Furthermore, studies involving teacher-led activities had mean effect sizes about .40 larger than those with student-led conditions. The importance of establishing the validity of the treatment construct in meta-analyses is also discussed.


Sign in / Sign up

Export Citation Format

Share Document