EFFECT SIZE–DRIVEN SAMPLE-SIZE PLANNING, RANDOMIZATION, AND MULTISITE USE IN L2

Author(s):  
Joseph P. Vitta ◽  
Christopher Nicklin ◽  
Stuart McLean

Abstract In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.

2017 ◽  
Author(s):  
Jose D. Perezgonzalez

Research often necessitates of samples, yet obtaining large enough samples is not always possible. When it is, the researcher may use one of two methods for deciding upon the required sample size: rules-of-thumb, quick yet uncertain, and estimations for power, mathematically precise yet with the potential to overestimate or underestimate sample sizes when effect sizes are unknown. Misestimated sample sizes have negative repercussions in the form of increased costs, abandoned projects or abandoned publication of non-significant results. Here I describe a procedure for estimating sample sizes adequate for the testing approach which is most common in the behavioural, social, and biomedical sciences, that of Fisher’s tests of significance. The procedure focuses on a desired minimum effect size for the research at hand and finds the minimum sample size required for capturing such effect size as a statistically significant result. In a similar fashion than power analyses, sensitiveness analyses can also be extended to finding the minimum effect for a given sample size a priori as well as to calculating sensitiveness a posteriori. The article provides a full tutorial for carrying out a sensitiveness analysis, as well as empirical support via simulation.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2009 ◽  
Vol 31 (4) ◽  
pp. 500-506 ◽  
Author(s):  
Robert Slavin ◽  
Dewi Smith

Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples. This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.


2013 ◽  
Vol 112 (3) ◽  
pp. 835-844 ◽  
Author(s):  
M. T. Bradley ◽  
A. Brand

Tables of alpha values as a function of sample size, effect size, and desired power were presented. The tables indicated expected alphas for small, medium, and large effect sizes given a variety of sample sizes. It was evident that sample sizes for most psychological studies are adequate for large effect sizes defined at .8. The typical alpha level of .05 and desired power of 90% can be achieved with 70 participants in two groups. It was perhaps doubtful if these ideal levels of alpha and power have generally been achieved for medium effect sizes in actual research, since 170 participants would be required. Small effect sizes have rarely been tested with an adequate number of participants or power. Implications were discussed.


2021 ◽  
Author(s):  
Daniel Lakens

An important step when designing a study is to justify the sample size that will be collected. The key aim of a sample size justification is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (an)almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are plausible in a specific research area. Researchers can use the guidelines presented in this article to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.


2017 ◽  
Vol 28 (11) ◽  
pp. 1547-1562 ◽  
Author(s):  
Samantha F. Anderson ◽  
Ken Kelley ◽  
Scott E. Maxwell

The sample size necessary to obtain a desired level of statistical power depends in part on the population value of the effect size, which is, by definition, unknown. A common approach to sample-size planning uses the sample effect size from a prior study as an estimate of the population value of the effect to be detected in the future study. Although this strategy is intuitively appealing, effect-size estimates, taken at face value, are typically not accurate estimates of the population effect size because of publication bias and uncertainty. We show that the use of this approach often results in underpowered studies, sometimes to an alarming degree. We present an alternative approach that adjusts sample effect sizes for bias and uncertainty, and we demonstrate its effectiveness for several experimental designs. Furthermore, we discuss an open-source R package, BUCSS, and user-friendly Web applications that we have made available to researchers so that they can easily implement our suggested methods.


2019 ◽  
Author(s):  
Stefan L.K. Gruijters ◽  
Gjalt - Jorn Ygram Peters

Experimental intervention tests need to have sufficient sample size to constitute a robust test of the intervention’s effectiveness with reasonable precision and power. To estimate the required sample size adequately, researchers are required to specify an effect size. But what effect size should be used to plan the required sample size? Various inroads into how to select the a priori effect size have been suggested in the literature – including using conventions, prior research, and theoretical or practical importance. In this paper, we first discuss problems with some of the proposed methods of selecting the effect size for study planning. Subsequently, we lay out a method that provides a way out of many of these problems. The method requires setting a meaningful change definition, and is specifically suited for applied researchers interested in planning tests of intervention effectiveness. We provide a hands-on walk through of the method and provide easy-to-use R functions to implement it.


2021 ◽  
Author(s):  
James Edward Bartlett ◽  
Sarah Jane Charles

Authors have highlighted for decades that sample size justification through power analysis is the exception rather than the rule. Even when authors do report a power analysis, there is often no justification for the smallest effect size of interest, or they do not provide enough information for the analysis to be reproducible. We argue one potential reason for these omissions is the lack of a truly accessible introduction to the key concepts and decisions behind power analysis. In this tutorial, we demonstrate a priori and sensitivity power analysis using jamovi for two independent samples and two dependent samples. Respectively, these power analyses allow you to ask the questions: “How many participants do I need to detect a given effect size?”, and “What effect sizes can I detect with a given sample size?”. We emphasise how power analysis is most effective as a reflective process during the planning phase of research to balance your inferential goals with your available resources. By the end of the tutorial, you will be able to understand the fundamental concepts behind power analysis and extend them to more advanced statistical models.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2018 ◽  
Vol 52 (4) ◽  
pp. 341-350 ◽  
Author(s):  
Michael FW Festing

Scientists using laboratory animals are under increasing pressure to justify their sample sizes using a “power analysis”. In this paper I review the three methods currently used to determine sample size: “tradition” or “common sense”, the “resource equation” and the “power analysis”. I explain how, using the “KISS” approach, scientists can make a provisional choice of sample size using any method, and then easily estimate the effect size likely to be detectable according to a power analysis. Should they want to be able to detect a smaller effect they can increase their provisional sample size and recalculate the effect size. This is simple, does not need any software and provides justification for the sample size in the terms used in a power analysis.


Sign in / Sign up

Export Citation Format

Share Document