scholarly journals Sample Size Justification

2021 ◽  
Author(s):  
Daniel Lakens

An important step when designing a study is to justify the sample size that will be collected. The key aim of a sample size justification is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (an)almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are plausible in a specific research area. Researchers can use the guidelines presented in this article to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.

2021 ◽  
Author(s):  
James Edward Bartlett ◽  
Sarah Jane Charles

Authors have highlighted for decades that sample size justification through power analysis is the exception rather than the rule. Even when authors do report a power analysis, there is often no justification for the smallest effect size of interest, or they do not provide enough information for the analysis to be reproducible. We argue one potential reason for these omissions is the lack of a truly accessible introduction to the key concepts and decisions behind power analysis. In this tutorial, we demonstrate a priori and sensitivity power analysis using jamovi for two independent samples and two dependent samples. Respectively, these power analyses allow you to ask the questions: “How many participants do I need to detect a given effect size?”, and “What effect sizes can I detect with a given sample size?”. We emphasise how power analysis is most effective as a reflective process during the planning phase of research to balance your inferential goals with your available resources. By the end of the tutorial, you will be able to understand the fundamental concepts behind power analysis and extend them to more advanced statistical models.


2021 ◽  
Author(s):  
Christopher McCrum ◽  
Jorg van Beek ◽  
Charlotte Schumacher ◽  
Sanne Janssen ◽  
Bas Van Hooren

Background: Context regarding how researchers determine the sample size of their experiments is important for interpreting the results and determining their value and meaning. Between 2018 and 2019, the journal Gait & Posture introduced a requirement for sample size justification in their author guidelines.Research Question: How frequently and in what ways are sample sizes justified in Gait & Posture research articles and was the inclusion of a guideline requiring sample size justification associated with a change in practice?Methods: The guideline was not in place prior to May 2018 and was in place from 25th July 2019. All articles in the three most recent volumes of the journal (84-86) and the three most recent, pre-guideline volumes (60-62) at time of preregistration were included in this analysis. This provided an initial sample of 324 articles (176 pre-guideline and 148 post-guideline). Articles were screened by two authors to extract author data, article metadata and sample size justification data. Specifically, screeners identified if (yes or no) and how sample sizes were justified. Six potential justification types (Measure Entire Population, Resource Constraints, Accuracy, A priori Power Analysis, Heuristics, No Justification) and an additional option of Other/Unsure/Unclear were used.Results: In most cases, authors of Gait & Posture articles did not provide a justification for their study’s sample size. The inclusion of the guideline was associated with a modest increase in the percentage of articles providing a justification (16.6% to 28.1%). A priori power calculations were the dominant type of justification, but many were not reported in enough detail to allow replication.Significance: Gait & Posture researchers should be more transparent in how they determine their sample sizes and carefully consider if they are suitable. Editors and journals may consider adding a similar guideline as a low-resource way to improve sample size justification reporting.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


Author(s):  
Joseph P. Vitta ◽  
Christopher Nicklin ◽  
Stuart McLean

Abstract In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.


2017 ◽  
Author(s):  
Jose D. Perezgonzalez

Research often necessitates of samples, yet obtaining large enough samples is not always possible. When it is, the researcher may use one of two methods for deciding upon the required sample size: rules-of-thumb, quick yet uncertain, and estimations for power, mathematically precise yet with the potential to overestimate or underestimate sample sizes when effect sizes are unknown. Misestimated sample sizes have negative repercussions in the form of increased costs, abandoned projects or abandoned publication of non-significant results. Here I describe a procedure for estimating sample sizes adequate for the testing approach which is most common in the behavioural, social, and biomedical sciences, that of Fisher’s tests of significance. The procedure focuses on a desired minimum effect size for the research at hand and finds the minimum sample size required for capturing such effect size as a statistically significant result. In a similar fashion than power analyses, sensitiveness analyses can also be extended to finding the minimum effect for a given sample size a priori as well as to calculating sensitiveness a posteriori. The article provides a full tutorial for carrying out a sensitiveness analysis, as well as empirical support via simulation.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2018 ◽  
Vol 52 (4) ◽  
pp. 341-350 ◽  
Author(s):  
Michael FW Festing

Scientists using laboratory animals are under increasing pressure to justify their sample sizes using a “power analysis”. In this paper I review the three methods currently used to determine sample size: “tradition” or “common sense”, the “resource equation” and the “power analysis”. I explain how, using the “KISS” approach, scientists can make a provisional choice of sample size using any method, and then easily estimate the effect size likely to be detectable according to a power analysis. Should they want to be able to detect a smaller effect they can increase their provisional sample size and recalculate the effect size. This is simple, does not need any software and provides justification for the sample size in the terms used in a power analysis.


2008 ◽  
Vol 90 (1) ◽  
pp. 58-61 ◽  
Author(s):  
SA Sexton ◽  
N Ferguson ◽  
C Pearce ◽  
DM Ricketts

INTRODUCTION Many studies published in medical journals do not consider the statistical power required to detect a meaningful difference between study groups. As a result, these studies are often underpowered: the sample size may not be large enough to pick up a statistically significant difference (or other effect of interest) of a given size between the study groups. Therefore, the conclusion that there is no statistically significant difference between groups cannot be made unless a study has been shown to have sufficient power. The aim of this study was to establish the prevalence of negative studies with inadequate statistical power in British journals to which orthopaedic surgeons regularly submit. MATERIALS AND METHODS We assessed all papers in the last consecutive six issues prior to the start of the study (April 2005) in The Journal of Bone and Joint Surgery (British), Injury, and Annals of the Royal College of Surgeons of England. We sought published evidence that a power analysis had been performed in association with the main hypothesis of the paper. RESULTS There were a total of 170 papers in which a statistical comparison of two or more groups was undertaken. Of these 170 papers, 49 (28.8%) stated as their primary conclusion that there was no statistically significant difference between the groups studied. Of these 49 papers, only 3 (6.1%) had performed a power analysis demonstrating adequate sample size. CONCLUSIONS These results demonstrate that the majority of negative studies in the British orthopaedic literature that we have looked at have not performed the statistical analysis necessary to reach their stated conclusions. In order to remedy this, we recommend that the journals sampled include the following guidance in their instructions to authors: the statement ‘no statistically significant difference was found between study groups’ should be accompanied by the results of a power analysis.


2018 ◽  
Vol 6 (8) ◽  
pp. 232596711879151 ◽  
Author(s):  
Brandon J. Erickson ◽  
Peter N. Chalmers ◽  
Jon Newgren ◽  
Marissa Malaret ◽  
Michael O’Brien ◽  
...  

Background: The Kerlan-Jobe Orthopaedic Clinic (KJOC) shoulder and elbow outcome score is a functional assessment tool for the upper extremity of the overhead athlete, which is currently validated for administration in person. Purpose/Hypothesis: The purpose of this study was to validate the KJOC score for administration over the phone. The hypothesis was that no difference will exist in KJOC scores for the same patient between administration in person versus over the phone. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Fifty patients were randomized to fill out the KJOC questionnaire either over the phone first (25 patients) or in person first (25 patients) based on an a priori power analysis. One week after the patients completed the initial KJOC on the phone or in person, they then filled out the score via the opposite method. Results were compared per question and for overall score. Results: There was a mean ± SD of 8 ± 5 days between when patients completed the first and second questionnaires. There were no significant differences in the overall KJOC score between the phone and paper groups ( P = .139). The intraclass correlation coefficient comparing paper and phone scores was 0.802 (95% CI, 0.767-0.883; P < .001), with a Cronbach alpha of 0.89. On comparison of individual questions, there were significant differences for questions 1, 3, and 8 ( P = .013, .023, and .042, respectively). Conclusion: The KJOC questionnaire can be administered over the phone with no significant difference in overall score as compared with that from in-person administration.


Sign in / Sign up

Export Citation Format

Share Document