scholarly journals Posterior Probabilities of Effect Sizes and Heterogeneity in Meta-Analysis: An Intuitive Approach of Dealing with Publication Bias

2021 ◽  
Author(s):  
Hilde Elisabeth Maria Augusteijn ◽  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

Publication bias remains to be a great challenge when conducting a meta-analysis. It may result in overestimated effect sizes, increased frequency of false positives, and over- or underestimation of the effect size heterogeneity parameter. A new method is introduced, Bayesian Meta-Analytic Snapshot (BMAS), which evaluates both effect size and its heterogeneity and corrects for potential publication bias. It evaluates the probability of the true effect size being zero, small, medium or large, and the probability of true heterogeneity being zero, small, medium or large. This approach, which provides an intuitive evaluation of uncertainty in the evaluation of effect size and heterogeneity, is illustrated with a real-data example, a simulation study, and a Shiny web application of BMAS.

2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert

More and more scientific research gets published nowadays, asking for statistical methods that enable researchers to get an overview of the literature in a particular research field. For that purpose, meta-analysis methods were developed that can be used for statistically combining the effect sizes from independent primary studies on the same topic. My dissertation focuses on two issues that are crucial when conducting a meta-analysis: publication bias and heterogeneity in primary studies’ true effect sizes. Accurate estimation of both the meta-analytic effect size as well as the between-study variance in true effect size is crucial since the results of meta-analyses are often used for policy making. Publication bias distorts the results of a meta-analysis since it refers to situations where publication of a primary study depends on its results. We developed new meta-analysis methods, p-uniform and p-uniform*, which estimate effect sizes corrected for publication bias and also test for publication bias. Although the methods perform well in many conditions, these and the other existing methods are shown not to perform well when researchers use questionable research practices. Additionally, when publication bias is absent or limited, traditional methods that do not correct for publication bias outperform p¬-uniform and p-uniform*. Surprisingly, we found no strong evidence for the presence of publication bias in our pre-registered study on the presence of publication bias in a large-scale data set consisting of 83 meta-analyses and 499 systematic reviews published in the fields of psychology and medicine. We also developed two methods for meta-analyzing a statistically significant published original study and a replication of that study, which reflects a situation often encountered by researchers. One method is a frequentist whereas the other method is a Bayesian statistical method. Both methods are shown to perform better than traditional meta-analytic methods that do not take the statistical significance of the original study into account. Analytical studies of both methods also show that sometimes the original study is better discarded for optimal estimation of the true effect size. Finally, we developed a program for determining the required sample size in a replication analogous to power analysis in null hypothesis testing. Computing the required sample size with the method revealed that large sample sizes (approximately 650 participants) are required to be able to distinguish a zero from a small true effect.Finally, in the last two chapters we derived a new multi-step estimator for the between-study variance in primary studies’ true effect sizes, and examined the statistical properties of two methods (Q-profile and generalized Q-statistic method) to compute the confidence interval of the between-study variance in true effect size. We proved that the multi-step estimator converges to the Paule-Mandel estimator which is nowadays one of the recommended methods to estimate the between-study variance in true effect sizes. Two Monte-Carlo simulation studies showed that the coverage probabilities of Q-profile and generalized Q-statistic method can be substantially below the nominal coverage rate if the assumptions underlying the random-effects meta-analysis model were violated.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hanna Suh ◽  
Jisun Jeong

Objectives: Self-compassion functions as a psychological buffer in the face of negative life experiences. Considering that suicidal thoughts and behaviors (STBs) and non-suicidal self-injury (NSSI) are often accompanied by intense negative feelings about the self (e.g., self-loathing, self-isolation), self-compassion may have the potential to alleviate these negative attitudes and feelings toward oneself. This meta-analysis investigated the associations of self-compassion with STBs and NSSI.Methods: A literature search finalized in August 2020 identified 18 eligible studies (13 STB effect sizes and seven NSSI effect sizes), including 8,058 participants. Two studies were longitudinal studies, and the remainder were cross-sectional studies. A random-effects meta-analysis was conducted using CMA 3.0. Subgroup analyses, meta-regression, and publication bias analyses were conducted to probe potential sources of heterogeneity.Results: With regard to STBs, a moderate effect size was found for self-compassion (r = −0.34, k = 13). Positively worded subscales exhibited statistically significant effect sizes: self-kindness (r = −0.21, k = 4), common humanity (r = −0.20, k = 4), and mindfulness (r = −0.15, k = 4). For NSSI, a small effect size was found for self-compassion (r = −0.29, k = 7). There was a large heterogeneity (I2 = 80.92% for STBs, I2 = 86.25% for NSSI), and publication bias was minimal. Subgroup analysis results showed that sample characteristic was a moderator, such that a larger effect size was witnessed in clinical patients than sexually/racially marginalized individuals, college students, and healthy-functioning community adolescents.Conclusions: Self-compassion was negatively associated with STBs and NSSI, and the effect size of self-compassion was larger for STBs than NSSI. More evidence is necessary to gauge a clinically significant protective role that self-compassion may play by soliciting results from future longitudinal studies or intervention studies.


2017 ◽  
Author(s):  
Hilde Augusteijn ◽  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

One of the main goals of meta-analysis is to test and estimate the heterogeneity of effect size. We examined the effect of publication bias on the Q-test and assessments of heterogeneity, as a function of true heterogeneity, publication bias, true effect size, number of studies, and variation of sample sizes. The expected values of heterogeneity measures H2 and I2 were analytically derived, and the power and the type I error rate of the Q-test were examined in a Monte-Carlo simulation study. Our results show that the effect of publication bias on the Q-test and assessment of heterogeneity is large, complex, and non-linear. Publication bias can both dramatically decrease and increase heterogeneity. Extreme homogeneity can occur even when the population heterogeneity is large. Particularly if the number of studies is large and population effect size is small, publication bias can cause both extreme type I error rates and power of the Q-test close to 0 or 1. We therefore conclude that the Q-test of homogeneity and heterogeneity measures H2 and I2 are generally not valid in assessing and testing heterogeneity when publication bias is present, especially when the true effect size is small and the number of studies is large. We introduce a web application, Q-sense, which can be used to assess the sensitivity of the Q-test to publication bias, and we apply it to two published meta-analysis. Meta-analytic methods should be enhanced in order to be able to deal with publication bias in their assessment and tests of heterogeneity.


2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

Publication bias is a major threat to the validity of a meta-analysis resulting in overestimated effect sizes. P-uniform is a meta-analysis method that corrects estimates for publication bias but overestimates average effect size if heterogeneity in true effect sizes (i.e., between-study variance) is present. We propose an extension and improvement of p-uniform called p-uniform*. P-uniform* improves upon p-uniform in three important ways, as it (i) entails a more efficient estimator, (ii) eliminates the overestimation of effect size in case of between-study variance in true effect sizes, and (iii) enables estimating and testing for the presence of the between-study variance. We compared the statistical properties of p-uniform* with p-uniform, the selection model approach of Hedges (1992), and the random-effects model. Statistical properties of p-uniform* and the selection model approach were comparable and generally outperformed p-uniform and the random-effects model if publication bias was present. We demonstrate that p-uniform* and the selection model approach estimate average effect size and between-study variance rather well with ten or more studies in the meta-analysis when publication bias is not extreme. P-uniform* generally provides more accurate estimates of the between-study variance in meta-analyses containing many studies (e.g., 60 or more) and if publication bias is present. However, both methods do not perform well if the meta-analysis only includes statistically significant studies. P-uniform performed best in this case but only when between-study variance was zero or small. We offer recommendations for applied researchers, and provide an R package and an easy-to-use web application for applying p-uniform*.


2018 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates observed in the social sciences. Both of these problems do not only increase the proportion of false positives in the literature but can also lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect and correct such bias in meta-analytic results. We present an evaluation of the performance of six of these tools in detecting bias. To assess the Type I error rate and the statistical power of these tools we simulated a large variety of literatures that differed with regard to underlying true effect size, heterogeneity, number of available primary studies and variation of sample sizes in these primary studies. Furthermore, simulated primary studies were subjected to different degrees of publication bias. Our results show that the power of the detection methods follows a complex pattern. Across all simulated conditions, no method consistently outperformed all others. Hence, choosing an optimal method would require knowledge about parameters (e.g., true effect size, heterogeneity) that meta-analysts cannot have. Additionally, all methods performed badly when true effect sizes were heterogeneous or primary studies had a small chance of being published irrespective of their results. This suggests, that in many actual meta-analyses in psychology bias will remain undiscovered no matter which detection method is used.


2018 ◽  
Author(s):  
Michele B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde Augusteijn ◽  
Elise Anne Victoire Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2,442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of .26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of in intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small study effects, potentially indicating publication bias and overestimated effects. We found no differences in small study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We conclude that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2020 ◽  
Vol 46 (2-3) ◽  
pp. 343-354 ◽  
Author(s):  
Timothy R Levine ◽  
René Weber

Abstract We examined the interplay between how communication researchers use meta-analyses to make claims and the prevalence, causes, and implications of unresolved heterogeneous findings. Heterogeneous findings can result from substantive moderators, methodological artifacts, and combined construct invalidity. An informal content analysis of meta-analyses published in four elite communication journals revealed that unresolved between-study effect heterogeneity was ubiquitous. Communication researchers mainly focus on computing mean effect sizes, to the exclusion of how effect sizes in primary studies are distributed and of what might be driving effect size distributions. We offer four recommendations for future meta-analyses. Researchers are advised to be more diligent and sophisticated in testing for heterogeneity. We encourage greater description of how effects are distributed, coupled with greater reliance on graphical displays. We council greater recognition of combined construct invalidity and advocate for content expertise. Finally, we endorse greater awareness and improved tests for publication bias and questionable research practices.


2020 ◽  
Vol 8 (4) ◽  
pp. 36
Author(s):  
Michèle B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Hilde E. M. Augusteijn ◽  
Elise A. V. Crompvoets ◽  
Jelte M. Wicherts

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson’s correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.


2017 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method.


2018 ◽  
Vol 226 (1) ◽  
pp. 56-80 ◽  
Author(s):  
Rolf Ulrich ◽  
Jeff Miller ◽  
Edgar Erdfelder

Abstract. Publication bias hampers the estimation of true effect sizes. Specifically, effect sizes are systematically overestimated when studies report only significant results. In this paper we show how this overestimation depends on the true effect size and on the sample size. Furthermore, we review and follow up methods originally suggested by Hedges (1984) , Iyengar and Greenhouse (1988) , and Rust, Lehmann, and Farley (1990) allowing the estimation of the true effect size from published test statistics (e.g., from the t-values of reported significant results). Moreover, we adapted these methods allowing meta-analysts to estimate the percentage of researchers who consign undesired results in a research domain to the file drawer. We also apply the same logic to the case when significant results tend to be underreported. We demonstrate the application of these procedures for conventional one-sample and two-sample t-tests. Finally, we provide R and MATLAB versions of a computer program to estimate the true unbiased effect size and the prevalence of publication bias in the literature.


Sign in / Sign up

Export Citation Format

Share Document