scholarly journals Towards a simplified justification of the chosen sample size

2021 ◽  
Author(s):  
Nick J. Broers ◽  
Henry Otgaar

Since the early work of Cohen (1962) psychological researchers have become aware of the importance of doing a power analysis to ensure that the predicted effect will be detectable with sufficient statistical power. APA guidelines require researchers to provide a justification of the chosen sample size with reference to the expected effect size; an expectation that should be based on previous research. However, we argue that a credible estimate of an expected effect size is only reasonable under two conditions: either the new study forms a direct replication of earlier work or the outcome scale makes use of meaningful and familiar units that allow for the quantification of a minimal effect of psychological interest. In practice neither of these conditions is usually met. We propose a different rationale for a power analysis that will ensure that researchers will be able to justify their sample size as meaningful and adequate.

2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2017 ◽  
Author(s):  
Chris Aberson

Preprint of Chapter appearing as: Aberson, C. L. (2015). Statistical power analysis. In R. A. Scott & S. M. Kosslyn (Eds.) Emerging trends in the behavioral and social sciences. Hoboken, NJ: Wiley.Statistical power refers to the probability of rejecting a false null hypothesis (i.e., finding what the researcher wants to find). Power analysis allows researchers to determine adequate sample size for designing studies with an optimal probability for rejecting false null hypotheses. When conducted correctly, power analysis helps researchers make informed decisions about sample size selection. Statistical power analysis most commonly involves specifying statistic test criteria (Type I error rate), desired level of power, and the effect size expected in the population. This article outlines the basic concepts relevant to statistical power, factors that influence power, how to establish the different parameters for power analysis, and determination and interpretation of the effect size estimates for power. I also address innovative work such as the continued development of software resources for power analysis and protocols for designing for precision of confidence intervals (a.k.a., accuracy in parameter estimation). Finally, I outline understudied areas such as power analysis for designs with multiple predictors, reporting and interpreting power analyses in published work, designing for meaningfully sized effects, and power to detect multiple effects in the same study.


1990 ◽  
Vol 47 (1) ◽  
pp. 2-15 ◽  
Author(s):  
Randall M. Peterman

Ninety-eight percent of recently surveyed papers in fisheries and aquatic sciences that did not reject some null hypothesis (H0) failed to report β, the probability of making a type II error (not rejecting H0 when it should have been), or statistical power (1 – β). However, 52% of those papers drew conclusions as if H0 were true. A false H0 could have been missed because of a low-power experiment, caused by small sample size or large sampling variability. Costs of type II errors can be large (for example, for cases that fail to detect harmful effects of some industrial effluent or a significant effect of fishing on stock depletion). Past statistical power analyses show that abundance estimation techniques usually have high β and that only large effects are detectable. I review relationships among β, power, detectable effect size, sample size, and sampling variability. I show how statistical power analysis can help interpret past results and improve designs of future experiments, impact assessments, and management regulations. I make recommendations for researchers and decision makers, including routine application of power analysis, more cautious management, and reversal of the burden of proof to put it on industry, not management agencies.


2018 ◽  
Vol 52 (4) ◽  
pp. 341-350 ◽  
Author(s):  
Michael FW Festing

Scientists using laboratory animals are under increasing pressure to justify their sample sizes using a “power analysis”. In this paper I review the three methods currently used to determine sample size: “tradition” or “common sense”, the “resource equation” and the “power analysis”. I explain how, using the “KISS” approach, scientists can make a provisional choice of sample size using any method, and then easily estimate the effect size likely to be detectable according to a power analysis. Should they want to be able to detect a smaller effect they can increase their provisional sample size and recalculate the effect size. This is simple, does not need any software and provides justification for the sample size in the terms used in a power analysis.


2020 ◽  
pp. 28-63
Author(s):  
A. G. Vinogradov

The article belongs to a special modern genre of scholar publications, so-called tutorials – articles devoted to the application of the latest methods of design, modeling or analysis in an accessible format in order to disseminate best practices. The article acquaints Ukrainian psychologists with the basics of using the R programming language to the analysis of empirical research data. The article discusses the current state of world psychology in connection with the Crisis of Confidence, which arose due to the low reproducibility of empirical research. This problem is caused by poor quality of psychological measurement tools, insufficient attention to adequate sample planning, typical statistical hypothesis testing practices, and so-called “questionable research practices.” The tutorial demonstrates methods for determining the sample size depending on the expected magnitude of the effect size and desired statistical power, performing basic variable transformations and statistical analysis of psychological research data using language and environment R. The tutorial presents minimal system of R functions required to carry out: modern analysis of reliability of measurement scales, sample size calculation, point and interval estimation of effect size for four the most widespread in psychology designs for the analysis of two variables’ interdependence. These typical problems include finding the differences between the means and variances in two or more samples, correlations between continuous and categorical variables. Practical information on data preparation, import, basic transformations, and application of basic statistical methods in the cloud version of RStudio is provided.


2008 ◽  
Vol 90 (1) ◽  
pp. 58-61 ◽  
Author(s):  
SA Sexton ◽  
N Ferguson ◽  
C Pearce ◽  
DM Ricketts

INTRODUCTION Many studies published in medical journals do not consider the statistical power required to detect a meaningful difference between study groups. As a result, these studies are often underpowered: the sample size may not be large enough to pick up a statistically significant difference (or other effect of interest) of a given size between the study groups. Therefore, the conclusion that there is no statistically significant difference between groups cannot be made unless a study has been shown to have sufficient power. The aim of this study was to establish the prevalence of negative studies with inadequate statistical power in British journals to which orthopaedic surgeons regularly submit. MATERIALS AND METHODS We assessed all papers in the last consecutive six issues prior to the start of the study (April 2005) in The Journal of Bone and Joint Surgery (British), Injury, and Annals of the Royal College of Surgeons of England. We sought published evidence that a power analysis had been performed in association with the main hypothesis of the paper. RESULTS There were a total of 170 papers in which a statistical comparison of two or more groups was undertaken. Of these 170 papers, 49 (28.8%) stated as their primary conclusion that there was no statistically significant difference between the groups studied. Of these 49 papers, only 3 (6.1%) had performed a power analysis demonstrating adequate sample size. CONCLUSIONS These results demonstrate that the majority of negative studies in the British orthopaedic literature that we have looked at have not performed the statistical analysis necessary to reach their stated conclusions. In order to remedy this, we recommend that the journals sampled include the following guidance in their instructions to authors: the statement ‘no statistically significant difference was found between study groups’ should be accompanied by the results of a power analysis.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6516-6516
Author(s):  
P. Bedard ◽  
M. K. Krzyzanowska ◽  
M. Pintilie ◽  
I. F. Tannock

6516 Background: Underpowered randomized clinical trials (RCTs) may expose participants to risks and burdens of research without scientific merit. We investigated the prevalence of underpowered RCTs presented at ASCO annual meetings. Methods: We surveyed all two-arm parallel phase III RCTs presented at the ASCO annual meeting from 1995–2003 where differences for the primary endpoint were non-statistically significant. Post hoc calculations were performed using a power of 80% and a=0.05 (two-sided) to determine the sample size required to detect a small, medium, and large effect size between the two groups. For studies reporting a proportion or time to event as a primary endpoint, effect size was expressed as an odds ratio (OR) or hazard ratio (HR) respectively, with a small effect size defined as OR/HR=1.3, medium effect size OR/HR=1.5, and large effect OR/HR=2.0. Logistic regression was used to identify factors associated with lack of statistical power. Results: Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 333 (78.7%) had adequate sample size to detect small, medium, and large effect sizes respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies presented at plenary or oral sessions (p<0.0001) and multicenter studies supported by a co-operative group were more likely to have adequate sample size (p<0.0001). Conclusion: Two-thirds of negative RCTs presented at the ASCO annual meeting do not have an adequate sample to detect a medium-sized treatment effect. Most underpowered negative RCTs do not report a sample size calculation or reasons for inadequate patient accrual. No significant financial relationships to disclose.


Sign in / Sign up

Export Citation Format

Share Document