scholarly journals An excess of positive results: Comparing the standard Psychology literature with Registered Reports

Author(s):  
Anne M. Scheel ◽  
Mitchell Schijen ◽  
Daniel Lakens

When studies with positive results that support the tested hypotheses have a higher probability of being published than studies with negative results, the literature will give a distorted view of the evidence for scientific claims. Psychological scientists have been concerned about the degree of distortion in their literature due to publication bias and inflated Type-1 error rates. Registered Reports were developed with the goal to minimise such biases: In this new publication format, peer review and the decision to publish take place before the study results are known. We compared the results in the full population of published Registered Reports in Psychology (N = 71 as of November 2018) with a random sample of hypothesis-testing studies from the standard literature (N = 152) by searching 633 journals for the phrase ‘test* the hypothes*’ (replicating a method by Fanelli, 2010). Analysing the first hypothesis reported in each paper, we found 96% positive results in standard reports, but only 44% positive results in Registered Reports. The difference remained nearly as large when direct replications were excluded from the analysis (96% vs 50% positive results). This large gap suggests that psychologists underreport negative results to an extent that threatens cumulative science. Although our study did not directly test the effectiveness of Registered Reports at reducing bias, these results show that the introduction of Registered Reports has led to a much larger proportion of negative results appearing in the published literature compared to standard reports.

Author(s):  
Kip D. Zimmerman ◽  
Mark A. Espeland ◽  
Carl D. Langefeld

AbstractCells from the same individual share a common genetic and environmental background and are not independent, therefore they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within sample correlation. Here, we document this dependence across a range of cell types and show that ‘pseudo-bulk’ aggregation methods are overly conservative and underpowered relative to mixed models. We propose applying two-part hurdle generalized linear mixed models with a random effect for individual to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kip D. Zimmerman ◽  
Mark A. Espeland ◽  
Carl D. Langefeld

AbstractCells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.


1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2020 ◽  
Author(s):  
Janet Aisbett ◽  
Daniel Lakens ◽  
Kristin Sainani

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.


2020 ◽  
Vol 103 (6) ◽  
pp. 1667-1679
Author(s):  
Shizhen S Wang

Abstract Background There are several statistical methods for detecting a difference of detection rates between alternative and reference qualitative microbiological assays in a single laboratory validation study with a paired design. Objective We compared performance of eight methods including McNemar’s test, sign test, Wilcoxon signed-rank test, paired t-test, and the regression methods based on conditional logistic (CLOGIT), mixed effects complementary log-log (MCLOGLOG), mixed effects logistic (MLOGIT) models, and a linear mixed effects model (LMM). Methods We first compared the minimum detectable difference in the proportion of detections between the alternative and reference detection methods among these statistical methods for a varied number of test portions. We then compared power and type 1 error rates of these methods using simulated data. Results The MCLOGLOG and MLOGIT models had the lowest minimum detectable difference, followed by the LMM and paired t-test. The MCLOGLOG and MLOGIT models had the highest average power but were anticonservative when correlation between the pairs of outcome values of the alternative and reference methods was high. The LMM and paired t-test had mostly the highest average power when the correlation was low and the second highest average power when the correlation was high. Type 1 error rates of these last two methods approached the nominal value of significance level when the number of test portions was moderately large (n > 20). Highlights The LMM and paired t-test are better choices than other competing methods, and we provide an example using real data.


2002 ◽  
Vol 51 (3) ◽  
pp. 524-527 ◽  
Author(s):  
Mark Wilkinson ◽  
Pedro R. Peres-Neto ◽  
Peter G. Foster ◽  
Clive B. Moncrieff

2017 ◽  
Author(s):  
Olivier Naret ◽  
Nimisha Chaturvedi ◽  
Istvan Bartha ◽  
Christian Hammer ◽  
Jacques Fellay

Studies of host genetic determinants of pathogen sequence variation can identify sites of genomic conflicts, by highlighting variants that are implicated in immune response on the host side and adaptive escape on the pathogen side. However, systematic genetic differences in host and pathogen populations can lead to inflated type I (false positive) and type II (false negative) error rates in genome-wide association analyses. Here, we demonstrate through simulation that correcting for both host and pathogen stratification reduces spurious signals and increases power to detect real associations in a variety of tested scenarios. We confirm the validity of the simulations by showing comparable results in an analysis of paired human and HIV genomes.


2020 ◽  
Author(s):  
Lior Rennert ◽  
Moonseong Heo ◽  
Alain H Litwin ◽  
Victor de Grutolla

Background: Stepped-wedge designs (SWDs) are currently being used to investigate interventions to reduce opioid overdose deaths in communities located in several states. However, these interventions are competing with external factors such as newly initiated public policies limiting opioid prescriptions, media awareness campaigns, and social distancing orders due to the COVID-19 pandemic. Furthermore, control communities may prematurely adopt components of the proposed intervention as they become widely available. These types of events induce confounding of the intervention effect by time. Such confounding is a well-known limitation of SWDs; a common approach to adjusting for it makes use of a mixed effects modeling framework that includes both fixed and random effects for time. However, these models have several shortcomings when multiple confounding factors are present. Methods: We discuss the limitations of existing methods based on mixed effects models in the context of proposed SWDs to investigate interventions intended to reduce mortality associated with the opioid epidemic, and propose solutions to accommodate deviations from assumptions that underlie these models. We conduct an extensive simulation study of anticipated data from SWD trials targeting the current opioid epidemic in order to examine the performance of these models under different sources of confounding. We specifically examine the impact of factors external to the study and premature adoption of intervention components. Results: When only external factors are present, our simulation studies show that commonly used mixed effects models can result in unbiased estimates of the intervention effect, but have inflated Type 1 error and result in under coverage of confidence intervals. These models are severely biased when confounding factors differentially impact intervention and control clusters; premature adoption of intervention components is an example of this scenario. In these scenarios, models that incorporate fixed intervention-by-time interaction terms and an unstructured covariance for the intervention-by-cluster-by-time random effects result in unbiased estimates of the intervention effect, reach nominal confidence interval coverage, and preserve Type 1 error, but may reduce power. Conclusions: The incorporation of fixed and random time effects in mixed effects models require certain assumptions about the impact of confounding by time in SWD. Violations of these assumptions can result in severe bias of the intervention effect estimate, under coverage of confidence intervals, and inflated Type 1 error. Since model choice has considerable impact on study power as well as validity of results, careful consideration needs to be given to choosing an appropriate model that takes into account potential confounding factors.


Sign in / Sign up

Export Citation Format

Share Document