A Priori Power Analysis: What Sample Size Do I Need?

2021 ◽  
pp. 139-149
Author(s):  
Darren George ◽  
Paul Mallery
2021 ◽  
Author(s):  
Daniel Lakens

An important step when designing a study is to justify the sample size that will be collected. The key aim of a sample size justification is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (an)almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are plausible in a specific research area. Researchers can use the guidelines presented in this article to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2018 ◽  
Vol 6 (8) ◽  
pp. 232596711879151 ◽  
Author(s):  
Brandon J. Erickson ◽  
Peter N. Chalmers ◽  
Jon Newgren ◽  
Marissa Malaret ◽  
Michael O’Brien ◽  
...  

Background: The Kerlan-Jobe Orthopaedic Clinic (KJOC) shoulder and elbow outcome score is a functional assessment tool for the upper extremity of the overhead athlete, which is currently validated for administration in person. Purpose/Hypothesis: The purpose of this study was to validate the KJOC score for administration over the phone. The hypothesis was that no difference will exist in KJOC scores for the same patient between administration in person versus over the phone. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Fifty patients were randomized to fill out the KJOC questionnaire either over the phone first (25 patients) or in person first (25 patients) based on an a priori power analysis. One week after the patients completed the initial KJOC on the phone or in person, they then filled out the score via the opposite method. Results were compared per question and for overall score. Results: There was a mean ± SD of 8 ± 5 days between when patients completed the first and second questionnaires. There were no significant differences in the overall KJOC score between the phone and paper groups ( P = .139). The intraclass correlation coefficient comparing paper and phone scores was 0.802 (95% CI, 0.767-0.883; P < .001), with a Cronbach alpha of 0.89. On comparison of individual questions, there were significant differences for questions 1, 3, and 8 ( P = .013, .023, and .042, respectively). Conclusion: The KJOC questionnaire can be administered over the phone with no significant difference in overall score as compared with that from in-person administration.


2021 ◽  
Author(s):  
Christopher McCrum ◽  
Jorg van Beek ◽  
Charlotte Schumacher ◽  
Sanne Janssen ◽  
Bas Van Hooren

Background: Context regarding how researchers determine the sample size of their experiments is important for interpreting the results and determining their value and meaning. Between 2018 and 2019, the journal Gait &amp; Posture introduced a requirement for sample size justification in their author guidelines.Research Question: How frequently and in what ways are sample sizes justified in Gait &amp; Posture research articles and was the inclusion of a guideline requiring sample size justification associated with a change in practice?Methods: The guideline was not in place prior to May 2018 and was in place from 25th July 2019. All articles in the three most recent volumes of the journal (84-86) and the three most recent, pre-guideline volumes (60-62) at time of preregistration were included in this analysis. This provided an initial sample of 324 articles (176 pre-guideline and 148 post-guideline). Articles were screened by two authors to extract author data, article metadata and sample size justification data. Specifically, screeners identified if (yes or no) and how sample sizes were justified. Six potential justification types (Measure Entire Population, Resource Constraints, Accuracy, A priori Power Analysis, Heuristics, No Justification) and an additional option of Other/Unsure/Unclear were used.Results: In most cases, authors of Gait &amp; Posture articles did not provide a justification for their study’s sample size. The inclusion of the guideline was associated with a modest increase in the percentage of articles providing a justification (16.6% to 28.1%). A priori power calculations were the dominant type of justification, but many were not reported in enough detail to allow replication.Significance: Gait &amp; Posture researchers should be more transparent in how they determine their sample sizes and carefully consider if they are suitable. Editors and journals may consider adding a similar guideline as a low-resource way to improve sample size justification reporting.


2014 ◽  
Vol 67 (9) ◽  
pp. 781-786 ◽  
Author(s):  
Allison Osmond ◽  
Hector Li-Chang ◽  
Richard Kirsch ◽  
Dimitrios Divaris ◽  
Vincent Falck ◽  
...  

AimsFollowing the introduction of colorectal cancer screening programmes throughout Canada, it became necessary to standardise the diagnosis of colorectal adenomas. Canadian guidelines for standardised reporting of adenomas were developed in 2011. The aims of the present study were (a) to assess interobserver variability in the classification of dysplasia and architecture in adenomas and (b) to determine if interobserver variability could be improved by the adoption of criteria specified in the national guidelines.MethodsAn a priori power analysis was used to determine an adequate number of cases and participants. Twelve pathologists independently classified 40 whole-slide images of adenomas according to architecture and dysplasia grade. Following a wash-out period, participants were provided with the national guidelines and asked to reclassify the study set.ResultsAt baseline, there was moderate interobserver agreement for architecture (K=0.4700; 95% CI 0.4427 to 0.4972) and dysplasia grade (K=0.5680; 95% CI 0.5299 to 0.6062). Following distribution of the guidelines, there was improved interobserver agreement in assessing architecture (K=0.5403; 95% CI 0.5133 to 0.5674)). For dysplasia grade, overall interobserver agreement remained moderate but decreased significantly (K=0.4833; 95% CI 0.4452 to 0.5215). Half of the cases contained high-grade dysplasia (HGD). Two pathologists diagnosed HGD in ≥75% of cases.ConclusionsThe improvement in interobserver agreement in classifying adenoma architecture suggests that national guidelines can be useful in disseminating knowledge, however, the variability in the diagnosis of HGD, even following guideline review suggests the need for ongoing knowledge-transfer exercises.


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592095150
Author(s):  
Daniël Lakens ◽  
Aaron R. Caldwell

Researchers often rely on analysis of variance (ANOVA) when they report results of experiments. To ensure that a study is adequately powered to yield informative results with an ANOVA, researchers can perform an a priori power analysis. However, power analysis for factorial ANOVA designs is often a challenge. Current software solutions do not allow power analyses for complex designs with several within-participants factors. Moreover, power analyses often need [Formula: see text] or Cohen’s f as input, but these effect sizes are not intuitive and do not generalize to different experimental designs. We have created the R package Superpower and online Shiny apps to enable researchers without extensive programming experience to perform simulation-based power analysis for ANOVA designs of up to three within- or between-participants factors. Predicted effects are entered by specifying means, standard deviations, and, for within-participants factors, the correlations. The simulation provides the statistical power for all ANOVA main effects, interactions, and individual comparisons. The software can plot power across a range of sample sizes, can control for multiple comparisons, and can compute power when the homogeneity or sphericity assumption is violated. This Tutorial demonstrates how to perform a priori power analysis to design informative studies for main effects, interactions, and individual comparisons and highlights important factors that determine the statistical power for factorial ANOVA designs.


Author(s):  
Xiaoyi Wang ◽  
Alexander Eiselmayer ◽  
Wendy E. Mackay ◽  
Kasper Hornbak ◽  
Chat Wacharamanotham

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6813 ◽  
Author(s):  
Aleksi Reito ◽  
Lauri Raittio ◽  
Olli Helminen

Background A recent study concluded that most findings reported as significant in sports medicine and arthroscopic surgery are not “robust” when evaluated with the Fragility Index (FI). A secondary analysis of data from a previous study was performed to investigate (1) the correctness of the findings, (2) the association between FI, p-value and post hoc power, (3) median power to detect a medium effect size, and (4) the implementation of sample size analysis in these randomized controlled trials (RCTs). Methods In addition to the 48 studies listed in the appendix accompanying the original study by Khan et al. (2017) we did a follow-up literature search and 18 additional studies were found. In total 66 studies were included in the analysis. We calculated post hoc power, p-values and confidence intervals associated with the main outcome variable. Use of a priori power analysis was recorded. The median power to detect small (h > 0.2), medium (h > 0.5), or large effect (h > 0.8) with a baseline proportion of events of 10% and 30% in each study included was calculated. Three simulation data sets were used to validate our findings. Results Inconsistencies were found in eight studies. A priori power analysis was missing in one-fourth of studies (16/66). The median power to detect a medium effect size with a baseline proportion of events of 10% and 30% was 42% and 43%, respectively. The FI was inherently associated with the achieved p-value and post hoc power. Discussion A relatively high proportion of studies had inconsistencies. The FI is a surrogate measure for p-value and post hoc power. Based on these studies, the median power in this field of research is suboptimal. There is an urgent need to investigate how well research claims in orthopedics hold in a replicated setting and the validity of research findings.


2003 ◽  
Vol 60 (7) ◽  
pp. 864-872 ◽  
Author(s):  
Richard A Hinrichsen

An a priori power analysis was conducted to aid the design of experiments aimed at estimating the reproductive success of hatchery-born spawners relative to wild-born spawners using parentage assignment. Power was defined as the probability of rejecting the null hypothesis of equal reproductive contributions of hatchery- and wild-born spawners. A maximum likelihood estimator of relative reproductive success and its variance were derived. The estimator allowed multiple brood years of data, which was an extension of current approaches. Power increased with stock productivity, initial spawner abundance, fraction of recruits and spawners sampled, and the number of brood years examined. Power decreased with error variance used in the production function. Assuming a fixed total number of spawners, power was a concave-down function of the fraction of hatchery-born spawners. Using nominal values of productivity, error variance, and fraction of hatchery-born spawners, an experiment could achieve 0.8 power if it was run for at least 5 years or if it was applied to a stock with high initial female spawners (>200) and run for at least 2 years.


Sign in / Sign up

Export Citation Format

Share Document