scholarly journals Publication Bias against Null Results

1997 ◽  
Vol 80 (1) ◽  
pp. 337-338 ◽  
Author(s):  
Raymond Hubbard ◽  
J. Scott Armstrong

Studies suggest a bias against the publication of null ( p >.05) results. Instead of significance, we advocate reporting effect sizes and confidence intervals, and using replication studies. If statistical tests are used, power tests should accompany them.

2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


2021 ◽  
Vol 35 (3) ◽  
pp. 175-192
Author(s):  
Maximilian Kasy

A key challenge for interpreting published empirical research is the fact that published findings might be selected by researchers or by journals. Selection might be based on criteria such as significance, consistency with theory, or the surprisingness of findings or their plausibility. Selection leads to biased estimates, reduced coverage of confidence intervals, and distorted posterior beliefs. I review methods for detecting and quantifying selection based on the distribution of p-values, systematic replication studies, and meta-studies. I then discuss the conflicting recommendations regarding selection result ing from alternative objectives, in particular, the validity of inference versus the relevance of findings for decision-makers. Based on this discussion, I consider various reform proposals, such as deemphasizing significance, pre-analysis plans, journals for null results and replication studies, and a functionally differentiated publication system. In conclusion, I argue that we need alternative foundations of statistics that go beyond the single-agent model of decision theory.


2021 ◽  
Vol 4 (3) ◽  
pp. 251524592110351
Author(s):  
Denis Cousineau ◽  
Marc-André Goulet ◽  
Bradley Harding

Plotting the data of an experiment allows researchers to illustrate the main results of a study, show effect sizes, compare conditions, and guide interpretations. To achieve all this, it is necessary to show point estimates of the results and their precision using error bars. Often, and potentially unbeknownst to them, researchers use a type of error bars—the confidence intervals—that convey limited information. For instance, confidence intervals do not allow comparing results (a) between groups, (b) between repeated measures, (c) when participants are sampled in clusters, and (d) when the population size is finite. The use of such stand-alone error bars can lead to discrepancies between the plot’s display and the conclusions derived from statistical tests. To overcome this problem, we propose to generalize the precision of the results (the confidence intervals) by adjusting them so that they take into account the experimental design and the sampling methodology. Unfortunately, most software dedicated to statistical analyses do not offer options to adjust error bars. As a solution, we developed an open-access, open-source library for R— superb—that allows users to create summary plots with easily adjusted error bars.


2021 ◽  
Vol 35 (3) ◽  
pp. 157-174
Author(s):  
Guido W. Imbens

The use of statistical significance and p-values has become a matter of substantial controversy in various fields using statistical methods. This has gone as far as some journals banning the use of indicators for statistical significance, or even any reports of p-values, and, in one case, any mention of confidence intervals. I discuss three of the issues that have led to these often-heated debates. First, I argue that in many cases, p-values and indicators of statistical significance do not answer the questions of primary interest. Such questions typically involve making (recommendations on) decisions under uncertainty. In that case, point estimates and measures of uncertainty in the form of confidence intervals or even better, Bayesian intervals, are often more informative summary statistics. In fact, in that case, the presence or absence of statistical significance is essentially irrelevant, and including them in the discussion may confuse the matter at hand. Second, I argue that there are also cases where testing null hypotheses is a natural goal and where p-values are reasonable and appropriate summary statistics. I conclude that banning them in general is counterproductive. Third, I discuss that the overemphasis in empirical work on statistical significance has led to abuse of p-values in the form of p-hacking and publication bias. The use of pre-analysis plans and replication studies, in combination with lowering the emphasis on statistical significance may help address these problems.


2015 ◽  
Vol 19 (2) ◽  
pp. 172-182 ◽  
Author(s):  
Michèle B. Nuijten ◽  
Marcel A. L. M. van Assen ◽  
Coosje L. S. Veldkamp ◽  
Jelte M. Wicherts

Replication is often viewed as the demarcation between science and nonscience. However, contrary to the commonly held view, we show that in the current (selective) publication system replications may increase bias in effect size estimates. Specifically, we examine the effect of replication on bias in estimated population effect size as a function of publication bias and the studies’ sample size or power. We analytically show that incorporating the results of published replication studies will in general not lead to less bias in the estimated population effect size. We therefore conclude that mere replication will not solve the problem of overestimation of effect sizes. We will discuss the implications of our findings for interpreting results of published and unpublished studies, and for conducting and interpreting results of meta-analyses. We also discuss solutions for the problem of overestimation of effect sizes, such as discarding and not publishing small studies with low power, and implementing practices that completely eliminate publication bias (e.g., study registration).


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 407
Author(s):  
Michael Duggan ◽  
Patrizio Tressoldi

Background: This is an update of the Mossbridge et al’s meta-analysis related to the physiological anticipation preceding seemingly unpredictable stimuli. The overall effect size observed was 0.21; 95% Confidence Intervals: 0.13 - 0.29 Methods: Eighteen new peer and non-peer reviewed studies completed from January 2008 to October 2017 were retrieved describing a total of 26 experiments and 34 associated effect sizes. Results: The overall weighted effect size, estimated with a frequentist multilevel random model, was: 0.29; 95% Confidence Intervals: 0.19-0.38; the overall weighted effect size, estimated with a multilevel Bayesian model, was: 0.29; 95% Credible Intervals: 0.18-0.39. Effect sizes of peer reviewed studies were slightly higher: 0.38; Confidence Intervals: 0.27-0.48 than non-peer reviewed articles: 0.22; Confidence Intervals: 0.05-0.39. The statistical estimation of the publication bias by using the Copas model suggest that the main findings are not contaminated by publication bias. Conclusions: In summary, with this update, the main findings reported in Mossbridge et al’s meta-analysis, are confirmed.


2016 ◽  
Vol 4 (1) ◽  
pp. 37-58 ◽  
Author(s):  
Keith Lohse ◽  
Taylor Buchanan ◽  
Matthew Miller

Appropriate statistical analysis is essential for accurate and reliable research. Statistical practices have an immediate impact on the perceived results of a single study but also remote effects on the dissemination of information among scientists and the cumulative nature of research. To accurately quantify potential problems facing the field of motor learning, we systematically reviewed publications from seven journals over the past 2 years to find experiments that tested the effects of different training conditions on delayed retention and transfer tests (i.e., classic motor learning paradigms). Eighteen studies were included. These studies had small sample sizes (Mdn n/group = 11.00, interquartile range [IQR]= 9.6–15.5), multiple dependent variables (Mdn = 2, IQR = 2–4), and many statistical tests per article (Mdn = 83.5, IQR = 55.8–112.5). The observed effect sizes were large (d = 0.71, IQR = 0.49, 1.11). However, the distribution of effect sizes was biased, t(16) = 3.48, p < .01. These metadata indicate problems with the way motor learning research is conducted (or at least published). We recommend several potential solutions to address these issues: a priori power calculations, prespecified analyses, data sharing, and dissemination of null results. Furthermore, we hope these data will spark serious action from all stakeholders (researchers, editorial boards, and publishers) in the field.


2017 ◽  
Author(s):  
Nicholas Alvaro Coles ◽  
Jeff T. Larsen ◽  
Heather Lench

The facial feedback hypothesis suggests that an individual’s experience of emotion is influenced by feedback from their facial movements. To evaluate the cumulative evidence for this hypothesis, we conducted a meta-analysis on 286 effect sizes derived from 138 studies that manipulated facial feedback and collected emotion self-reports. Using random effects meta-regression with robust variance estimates, we found that the overall effect of facial feedback was significant, but small. Results also indicated that feedback effects are stronger in some circumstances than others. We examined 12 potential moderators, and three were associated with differences in effect sizes. 1. Type of emotional outcome: Facial feedback influenced emotional experience (e.g., reported amusement) and, to a greater degree, affective judgments of a stimulus (e.g., the objective funniness of a cartoon). Three publication bias detection methods did not reveal evidence of publication bias in studies examining the effects of facial feedback on emotional experience, but all three methods revealed evidence of publication bias in studies examining affective judgments. 2. Presence of emotional stimuli: Facial feedback effects on emotional experience were larger in the absence of emotionally evocative stimuli (e.g., cartoons). 3. Type of stimuli: When participants were presented with emotionally evocative stimuli, facial feedback effects were larger in the presence of some types of stimuli (e.g., emotional sentences) than others (e.g., pictures). The available evidence supports the facial feedback hypothesis’ central claim that facial feedback influences emotional experience, although these effects tend to be small and heterogeneous.


Sign in / Sign up

Export Citation Format

Share Document