Alpha Values as a Function of Sample Size, Effect Size, and Power: Accuracy over Inference

Tables of alpha values as a function of sample size, effect size, and desired power were presented. The tables indicated expected alphas for small, medium, and large effect sizes given a variety of sample sizes. It was evident that sample sizes for most psychological studies are adequate for large effect sizes defined at .8. The typical alpha level of .05 and desired power of 90% can be achieved with 70 participants in two groups. It was perhaps doubtful if these ideal levels of alpha and power have generally been achieved for medium effect sizes in actual research, since 170 participants would be required. Small effect sizes have rarely been tested with an adequate number of participants or power. Implications were discussed.

Download Full-text

Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing ‐ A Monte Carlo Simulation Approach.

Computational Communication Research ◽

10.5117/ccr2021.1.003.geis ◽

2021 ◽

Vol 3 (1) ◽

pp. 61-89

Author(s):

Stefan Geiß

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Content Analysis ◽

Sample Size ◽

Effect Size ◽

Statistical Power ◽

Effect Sizes ◽

Sample Sizes ◽

Expected Effect ◽

Sample Size Effect

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.

Download Full-text

Hypothesis Testing: Sample Size, Effect Size, Power, and Type II Errors

Basic Biostatistics for Medical and Biomedical Practitioners ◽

10.1016/b978-0-12-817084-7.00011-5 ◽

2019 ◽

pp. 173-185

Author(s):

Julien I.E. Hoffman

Keyword(s):

Size Effect ◽

Hypothesis Testing ◽

Sample Size ◽

Effect Size ◽

Type Ii ◽

Testing Sample ◽

Sample Size Effect ◽

Type Ii Errors

Download Full-text

What Is the Proportion of Studies Reporting Patient and Practitioner Satisfaction with Software Support Tools Used in the Management of Knee Pain and Is This Related to Sample Size, Effect Size, and Journal Impact Factor?

Telemedicine Journal and e-Health ◽

10.1089/tmj.2017.0207 ◽

2018 ◽

Vol 24 (8) ◽

pp. 562-576 ◽

Cited By ~ 2

Author(s):

Philip Bright ◽

Karen Hambly

Keyword(s):

Size Effect ◽

Impact Factor ◽

Sample Size ◽

Knee Pain ◽

Effect Size ◽

Journal Impact Factor ◽

Software Support ◽

Support Tools ◽

Sample Size Effect ◽

Journal Impact

Download Full-text

Effect Size and Power in fMRI Group Analysis

10.1101/295048 ◽

2018 ◽

Cited By ~ 18

Author(s):

Stephan Geuter ◽

Guanghao Qi ◽

Robert C. Welsh ◽

Tor D. Wager ◽

Martin A. Lindquist

Keyword(s):

Sample Size ◽

Group Analysis ◽

Population Level ◽

Small Sample ◽

Effect Sizes ◽

Medium Effect ◽

Single Subject ◽

Parameter Estimates ◽

Sample Sizes ◽

Human Connectome Project

AbstractMulti-subject functional magnetic resonance imaging (fMRI) analysis is often concerned with determining whether there exists a significant population-wide ‘activation’ in a comparison between two or more conditions. Typically this is assessed by testing the average value of a contrast of parameter estimates (COPE) against zero in a general linear model (GLM) analysis. In this work we investigate several aspects of this type of analysis. First, we study the effects of sample size on the sensitivity and reliability of the group analysis, allowing us to evaluate the ability of small sampled studies to effectively capture population-level effects of interest. Second, we assess the difference in sensitivity and reliability when using volumetric or surface based data. Third, we investigate potential biases in estimating effect sizes as a function of sample size. To perform this analysis we utilize the task-based fMRI data from the 500-subject release from the Human Connectome Project (HCP). We treat the complete collection of subjects (N = 491) as our population of interest, and perform a single-subject analysis on each subject in the population. We investigate the ability to recover population level effects using a subset of the population and standard analytical techniques. Our study shows that sample sizes of 40 are generally able to detect regions with high effect sizes (Cohen’s d > 0.8), while sample sizes closer to 80 are required to reliably recover regions with medium effect sizes (0.5 < d < 0.8). We find little difference in results when using volumetric or surface based data with respect to standard mass-univariate group analysis. Finally, we conclude that special care is needed when estimating effect sizes, particularly for small sample sizes.

Download Full-text

An Examination of Effect Sizes and Statistical Power in Speech, Language, and Hearing Research

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00299 ◽

2020 ◽

Vol 63 (5) ◽

pp. 1572-1580

Author(s):

Laura Gaeta ◽

Christopher R. Brydges

Keyword(s):

Effect Size ◽

Statistical Power ◽

Effect Sizes ◽

Medium Effect ◽

Speech Language Pathology ◽

Sample Sizes ◽

Hearing Research ◽

Language Pathology ◽

Meta Analyses ◽

Published Research

Purpose The purpose was to examine and determine effect size distributions reported in published audiology and speech-language pathology research in order to provide researchers and clinicians with more relevant guidelines for the interpretation of potentially clinically meaningful findings. Method Cohen's d, Hedges' g, Pearson r, and sample sizes ( n = 1,387) were extracted from 32 meta-analyses in journals in speech-language pathology and audiology. Percentile ranks (25th, 50th, 75th) were calculated to determine estimates for small, medium, and large effect sizes, respectively. The median sample size was also used to explore statistical power for small, medium, and large effect sizes. Results For individual differences research, effect sizes of Pearson r = .24, .41, and .64 were found. For group differences, Cohen's d /Hedges' g = 0.25, 0.55, and 0.93. These values can be interpreted as small, medium, and large effect sizes in speech-language pathology and audiology. The majority of published research was inadequately powered to detect a medium effect size. Conclusions Effect size interpretations from published research in audiology and speech-language pathology were found to be underestimated based on Cohen's (1988, 1992) guidelines. Researchers in the field should consider using Pearson r = .25, .40, and .65 and Cohen's d /Hedges' g = 0.25, 0.55, and 0.95 as small, medium, and large effect sizes, respectively, and collect larger sample sizes to ensure that both significant and nonsignificant findings are robust and replicable.

Download Full-text

The Relationship Between Sample Sizes and Effect Sizes in Systematic Reviews in Education

Educational Evaluation and Policy Analysis ◽

10.3102/0162373709352369 ◽

2009 ◽

Vol 31 (4) ◽

pp. 500-506 ◽

Cited By ~ 98

Author(s):

Robert Slavin ◽

Dewi Smith

Keyword(s):

Sample Size ◽

Effect Size ◽

Secondary Mathematics ◽

Significant Negative Correlation ◽

Small Sample ◽

Effect Sizes ◽

Sample Sizes ◽

Large Samples ◽

Small Sample Sizes ◽

The Relationship

Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples. This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.

Download Full-text

ALPHA VALUES AS A FUNCTION OF SAMPLE SIZE, EFFECT SIZE, AND POWER: ACCURACY OVER INFERENCE1

Psychological Reports ◽

10.2466/03.49.pr0.112.3 ◽

2013 ◽

pp. 130701095909004

Author(s):

M. T. Bradley ◽

A. Brand

Keyword(s):

Size Effect ◽

Sample Size ◽

Effect Size ◽

Sample Size Effect

Download Full-text

Sample size, effect size, and statistical power: a replication study of Weisburd’s paradox

Journal of Experimental Criminology ◽

10.1007/s11292-014-9212-9 ◽

2014 ◽

Vol 11 (1) ◽

pp. 141-163 ◽

Cited By ~ 9

Author(s):

Matthew S. Nelson ◽

Alese Wooditch ◽

Lisa M. Dario

Keyword(s):

Size Effect ◽

Sample Size ◽

Effect Size ◽

Statistical Power ◽

Replication Study ◽

Sample Size Effect

Download Full-text

Implications of sample size and acquired number of steps to investigate running biomechanics

Scientific Reports ◽

10.1038/s41598-021-82876-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Anderson Souza Oliveira ◽

Cristina Ioana Pirscoveanu

Keyword(s):

Sample Size ◽

Statistical Power ◽

Loading Rate ◽

Sequential Estimation ◽

Human Movement ◽

Effect Sizes ◽

Medium Effect ◽

Dependent Manner ◽

Sample Sizes ◽

Running Speed

AbstractLow reproducibility and non-optimal sample sizes are current concerns in scientific research, especially within human movement studies. Therefore, this study aimed to examine the implications of different sample sizes and number of steps on data variability and statistical outcomes from kinematic and kinetics running biomechanical variables. Forty-four participants ran overground using their preferred technique (normal) and minimizing the contact sound volume (silent). Running speed, peak vertical, braking forces, and vertical average loading rate were extracted from > 40 steps/runner. Data stability was computed using a sequential estimation technique. Statistical outcomes (p values and effect sizes) from the comparison normal vs silent running were extracted from 100,000 random samples, using various combinations of sample size (from 10 to 40 runners) and number of steps (from 5 to 40 steps). The results showed that only 35% of the study sample could reach average stability using up to 10 steps across all biomechanical variables. The loading rate was consistently significantly lower during silent running compared to normal running, with large effect sizes across all combinations. However, variables presenting small or medium effect sizes (running speed and peak braking force), required > 20 runners to reach significant differences. Therefore, varying sample sizes and number of steps are shown to influence the normal vs silent running statistical outcomes in a variable-dependent manner. Based on our results, we recommend that studies involving analysis of traditional running biomechanical variables use a minimum of 25 participants and 25 steps from each participant to provide appropriate data stability and statistical power.

Download Full-text

EFFECT SIZE–DRIVEN SAMPLE-SIZE PLANNING, RANDOMIZATION, AND MULTISITE USE IN L2

Studies in Second Language Acquisition ◽

10.1017/s0272263121000541 ◽

2021 ◽

pp. 1-25

Author(s):

Joseph P. Vitta ◽

Christopher Nicklin ◽

Stuart McLean

Keyword(s):

Second Language ◽

Sample Size ◽

Effect Size ◽

A Priori ◽

Effect Sizes ◽

Construction Process ◽

Sample Sizes ◽

Randomized Sampling ◽

Sample Size Planning ◽

Planning Processes

Abstract In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.

Download Full-text