scholarly journals Examining publication bias – A simulation-based evaluation of statistical tests on publication bias

Author(s):  
Andreas Schneck

Background Publication bias is a form of scientific misconduct. It threatens the validity of research results and the credibility of science. Although several tests on publication bias exist, no in-depth evaluations are available that suggest which test to use for the specific research problem. Methods In the study at hand four tests on publication bias, Egger’s test (FAT), p-uniform, the test of excess significance (TES), as well as the caliper test, were evaluated in a Monte Carlo simulation. Two different types of publication bias, as well as its degree (0%, 50%, 100%), were simulated. The type of publication bias was defined either as file-drawer, meaning the repeated analysis of new datasets, or p-hacking, meaning the inclusion of covariates in order to obtain a significant result. In addition, the underlying effect (β = 0, 0.5, 1, 1.5), effect heterogeneity, and the number of observations in the simulated primary studies (N =100, 500), as well as in the number of observations for the publication bias tests (K =100, 1000), were varied. Results All tests evaluated were able to identify publication bias both in the file-drawer and p-hacking condition. The false positive rates were, with the exception of the 15%- and 20%-caliper test, unbiased. The FAT had the largest statistical power in the file-drawer conditions, whereas under p-hacking the TES was, except under effect heterogeneity, slightly better. The caliper test was, however, inferior to the other tests under effect homogeneity and had a decent statistical power only in conditions with 1000 primary studies. Discussion The FAT is recommended as a test for publication bias in standard meta-analyses with no or only small effect heterogeneity. If no clear direction of publication bias is suspected the TES is the first alternative to the FAT. The 5%-caliper tests is recommended under conditions of effect heterogeneity, which may be found if publication bias is examined in a discipline-wide setting when primary studies cover different research problems.

Author(s):  
Andreas Schneck

Background Publication bias is a form of scientific misconduct. It threatens the validity of research results and the credibility of science. Although several tests on publication bias exist, no in-depth evaluations are available that suggest which test to use for the specific research problem. Methods In the study at hand four tests on publication bias, Egger’s test (FAT), p-uniform, the test of excess significance (TES), as well as the caliper test, were evaluated in a Monte Carlo simulation. Two different types of publication bias, as well as its degree (0%, 50%, 100%), were simulated. The type of publication bias was defined either as file-drawer, meaning the repeated analysis of new datasets, or p-hacking, meaning the inclusion of covariates in order to obtain a significant result. In addition, the underlying effect (β = 0, 0.5, 1, 1.5), effect heterogeneity, and the number of observations in the simulated primary studies (N =100, 500), as well as in the number of observations for the publication bias tests (K =100, 1000), were varied. Results All tests evaluated were able to identify publication bias both in the file-drawer and p-hacking condition. The false positive rates were, with the exception of the 15%- and 20%-caliper test, unbiased. The FAT had the largest statistical power in the file-drawer conditions, whereas under p-hacking the TES was, except under effect heterogeneity, slightly better. The caliper test was, however, inferior to the other tests under effect homogeneity and had a decent statistical power only in conditions with 1000 primary studies. Discussion The FAT is recommended as a test for publication bias in standard meta-analyses with no or only small effect heterogeneity. If no clear direction of publication bias is suspected the TES is the first alternative to the FAT. The 5%-caliper tests is recommended under conditions of effect heterogeneity, which may be found if publication bias is examined in a discipline-wide setting when primary studies cover different research problems.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e4115 ◽  
Author(s):  
Andreas Schneck

Background Publication bias is a form of scientific misconduct. It threatens the validity of research results and the credibility of science. Although several tests on publication bias exist, no in-depth evaluations are available that examine which test performs best for different research settings. Methods Four tests on publication bias, Egger’s test (FAT), p-uniform, the test of excess significance (TES), as well as the caliper test, were evaluated in a Monte Carlo simulation. Two different types of publication bias and its degree (0%, 50%, 100%) were simulated. The type of publication bias was defined either as file-drawer, meaning the repeated analysis of new datasets, or p-hacking, meaning the inclusion of covariates in order to obtain a significant result. In addition, the underlying effect (β = 0, 0.5, 1, 1.5), effect heterogeneity, the number of observations in the simulated primary studies (N = 100, 500), and the number of observations for the publication bias tests (K = 100, 1,000) were varied. Results All tests evaluated were able to identify publication bias both in the file-drawer and p-hacking condition. The false positive rates were, with the exception of the 15%- and 20%-caliper test, unbiased. The FAT had the largest statistical power in the file-drawer conditions, whereas under p-hacking the TES was, except under effect heterogeneity, slightly better. The CTs were, however, inferior to the other tests under effect homogeneity and had a decent statistical power only in conditions with 1,000 primary studies. Discussion The FAT is recommended as a test for publication bias in standard meta-analyses with no or only small effect heterogeneity. If two-sided publication bias is suspected as well as under p-hacking the TES is the first alternative to the FAT. The 5%-caliper test is recommended under conditions of effect heterogeneity and a large number of primary studies, which may be found if publication bias is examined in a discipline-wide setting when primary studies cover different research problems.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2021 ◽  
Author(s):  
Maximilian Maier ◽  
Tyler VanderWeele ◽  
Maya B Mathur

In meta-analyses, it is critical to assess the extent to which publication bias might have compromised the results. Classical methods based on the funnel plot, including Egger’s test and Trim-and-Fill, have become the de facto default methods to do so, with a large majority of recent meta-analyses in top medical journals (85%) assessing for publication bias exclusively using these methods. However, these classical funnel plot methods have important limitations when used as the sole means of assessing publication bias: they essentially assume that the publication process favors large point estimates for small studies and does not affect the largest studies, and they can perform poorly when effects are heterogeneous. In light of these limitations, we recommend that meta-analyses routinely apply other publication bias methods in addition to or instead of classical funnel plot methods. To this end, we describe how to use and interpret selection models. These methods make the often more realistic assumption that publication bias favors ``statistically significant'' results and that also directly accommodate effect heterogeneity. Selection models are well-established in the statistics literature and are supported by user-friendly software, yet remain rarely reported in many disciplines. We use previously published meta-analyses to demonstrate that selection models can yield insights that extend beyond those provided by funnel plot methods, suggesting the importance of establishing more comprehensive reporting practices for publication bias assessment.


2020 ◽  
Author(s):  
Helen Niemeyer ◽  
Robbie Cornelis Maria van Aert

Meta-analyses are susceptible to publication bias, the selective publication of studies with statistically significant results. If publication bias is present in psychotherapy research, the efficacy of interventions will likely be overestimated. This study has two aims: (1) investigate whether the application of publication bias methods is warranted in psychotherapy research on posttraumatic stress disorder (PTSD) and (2) investigate the degree and impact of publication bias in meta-analyses of the efficacy of psychotherapeutic treatment for PTSD. A comprehensive literature search was conducted and 26 meta-analyses were eligible for bias assessment. A Monte-Carlo simulation study closely resembling characteristics of the included meta-analyses revealed that statistical power of publication bias tests was generally low. Our results showed that publication bias tests had low statistical power and yielded imprecise estimates corrected for publication bias due to characteristics of the data. We recommend to assess publication bias using multiple publication bias methods, but only include methods that show acceptable performance in a method performance check that researchers first have to conduct themselves.


2020 ◽  
Vol 4 ◽  
Author(s):  
Helen Niemeyer ◽  
Robbie C.M. Van Aert ◽  
Sebastian Schmid ◽  
Dominik Uelsmann ◽  
Christine Knaevelsrud ◽  
...  

Meta-analyses are susceptible to publication bias, the selective publication of studies with statistically significant results. If publication bias is present in psychotherapy research, the efficacy of interventions will likely be overestimated. This study has two aims: (1) investigate whether the application of publication bias methods is warranted in psychotherapy research on posttraumatic stress disorder (PTSD) and (2) investigate the degree and impact of publication bias in meta-analyses of the efficacy of psychotherapeutic treatment for PTSD. A comprehensive literature search was conducted and 26 meta-analyses were eligible for bias assessment. A Monte-Carlo simulation study closely resembling characteristics of the included meta-analyses revealed that statistical power of publication bias tests was generally low. Our results showed that publication bias tests had low statistical power and yielded imprecise estimates corrected for publication bias due to characteristics of the data. We recommend to assess publication bias using multiple publication bias methods, but only include methods that show acceptable performance in a method performance check that researchers first have to conduct themselves.


2018 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates observed in the social sciences. Both of these problems do not only increase the proportion of false positives in the literature but can also lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect and correct such bias in meta-analytic results. We present an evaluation of the performance of six of these tools in detecting bias. To assess the Type I error rate and the statistical power of these tools we simulated a large variety of literatures that differed with regard to underlying true effect size, heterogeneity, number of available primary studies and variation of sample sizes in these primary studies. Furthermore, simulated primary studies were subjected to different degrees of publication bias. Our results show that the power of the detection methods follows a complex pattern. Across all simulated conditions, no method consistently outperformed all others. Hence, choosing an optimal method would require knowledge about parameters (e.g., true effect size, heterogeneity) that meta-analysts cannot have. Additionally, all methods performed badly when true effect sizes were heterogeneous or primary studies had a small chance of being published irrespective of their results. This suggests, that in many actual meta-analyses in psychology bias will remain undiscovered no matter which detection method is used.


2017 ◽  
Author(s):  
Freya Acar ◽  
Ruth Seurinck ◽  
Simon B. Eickhoff ◽  
Beatrijs Moerkerke

AbstractThe importance of integrating research findings is incontrovertible and coordinate based meta-analyses have become a popular approach to combine results of fMRI studies when only peaks of activation are reported. Similar to classical meta-analyses, coordinate based meta-analyses may be subject to different forms of publication bias which impacts results and possibly invalidates findings. We develop a tool that assesses the robustness to potential publication bias on cluster level. We investigate the possible influence of the file-drawer effect, where studies that do not report certain results fail to get published, by determining the number of noise studies that can be added to an existing fMRI meta-analysis before the results are no longer statistically significant. In this paper we illustrate this tool through an example and test the effect of several parameters through extensive simulations. We provide an algorithm for which code is freely available to generate noise studies and enables users to determine the robustness of meta-analytical results.


Sign in / Sign up

Export Citation Format

Share Document