Using simulation and resampling to improve the statistical power and reproducibility of psychological research
The replication crisis has brought about an increased focus on improving the reproducibility of psychological research (Open Science Collaboration, 2015). Although some failed replications reflect false-positives in original research findings, many are likely the result of low statistical power, which can cause failed replications even when an effect is real, no questionable research practices are used, and an experiment’s methodology is repeated perfectly. The present paper describes a simulation method (bootstrap resampling) that can be used in combination with pilot data or synthetic data to produce highly powered experimental designs. Unlike other commonly used power analysis approaches (e.g., G*Power), bootstrap resampling can be adapted to any experimental design to account for various factors that influence statistical power, including sample size, number of trials per condition, and participant exclusion criteria. Ignoring some of these factors (e.g., by using G*Power) can overestimate the power of a study or replication, increasing the likelihood that your findings will not replicate. By demonstrating how these factors influence the consistency of experimental findings, this paper provides examples of how simulation can be used to improve statistical power and reproducibility. Further, we provide a MATLAB toolbox that can be used to implement these simulation-based methods on existing pilot data (https://harvard-visionlab.github.io/power-sim).