Effect Size and Power in fMRI Group Analysis

AbstractMulti-subject functional magnetic resonance imaging (fMRI) analysis is often concerned with determining whether there exists a significant population-wide ‘activation’ in a comparison between two or more conditions. Typically this is assessed by testing the average value of a contrast of parameter estimates (COPE) against zero in a general linear model (GLM) analysis. In this work we investigate several aspects of this type of analysis. First, we study the effects of sample size on the sensitivity and reliability of the group analysis, allowing us to evaluate the ability of small sampled studies to effectively capture population-level effects of interest. Second, we assess the difference in sensitivity and reliability when using volumetric or surface based data. Third, we investigate potential biases in estimating effect sizes as a function of sample size. To perform this analysis we utilize the task-based fMRI data from the 500-subject release from the Human Connectome Project (HCP). We treat the complete collection of subjects (N = 491) as our population of interest, and perform a single-subject analysis on each subject in the population. We investigate the ability to recover population level effects using a subset of the population and standard analytical techniques. Our study shows that sample sizes of 40 are generally able to detect regions with high effect sizes (Cohen’s d > 0.8), while sample sizes closer to 80 are required to reliably recover regions with medium effect sizes (0.5 < d < 0.8). We find little difference in results when using volumetric or surface based data with respect to standard mass-univariate group analysis. Finally, we conclude that special care is needed when estimating effect sizes, particularly for small sample sizes.

Download Full-text

The Relationship Between Sample Sizes and Effect Sizes in Systematic Reviews in Education

Educational Evaluation and Policy Analysis ◽

10.3102/0162373709352369 ◽

2009 ◽

Vol 31 (4) ◽

pp. 500-506 ◽

Cited By ~ 98

Author(s):

Robert Slavin ◽

Dewi Smith

Keyword(s):

Sample Size ◽

Effect Size ◽

Secondary Mathematics ◽

Significant Negative Correlation ◽

Small Sample ◽

Effect Sizes ◽

Sample Sizes ◽

Large Samples ◽

Small Sample Sizes ◽

The Relationship

Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples. This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.

Download Full-text

Alpha Values as a Function of Sample Size, Effect Size, and Power: Accuracy over Inference

Psychological Reports ◽

10.2466/03.49.pr0.112.3.835-844 ◽

2013 ◽

Vol 112 (3) ◽

pp. 835-844 ◽

Cited By ~ 4

Author(s):

M. T. Bradley ◽

A. Brand

Keyword(s):

Size Effect ◽

Sample Size ◽

Effect Size ◽

Effect Sizes ◽

Medium Effect ◽

Sample Sizes ◽

Adequate Number ◽

Alpha Level ◽

Sample Size Effect ◽

Psychological Studies

Tables of alpha values as a function of sample size, effect size, and desired power were presented. The tables indicated expected alphas for small, medium, and large effect sizes given a variety of sample sizes. It was evident that sample sizes for most psychological studies are adequate for large effect sizes defined at .8. The typical alpha level of .05 and desired power of 90% can be achieved with 70 participants in two groups. It was perhaps doubtful if these ideal levels of alpha and power have generally been achieved for medium effect sizes in actual research, since 170 participants would be required. Small effect sizes have rarely been tested with an adequate number of participants or power. Implications were discussed.

Download Full-text

Implications of sample size and acquired number of steps to investigate running biomechanics

Scientific Reports ◽

10.1038/s41598-021-82876-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Anderson Souza Oliveira ◽

Cristina Ioana Pirscoveanu

Keyword(s):

Sample Size ◽

Statistical Power ◽

Loading Rate ◽

Sequential Estimation ◽

Human Movement ◽

Effect Sizes ◽

Medium Effect ◽

Dependent Manner ◽

Sample Sizes ◽

Running Speed

AbstractLow reproducibility and non-optimal sample sizes are current concerns in scientific research, especially within human movement studies. Therefore, this study aimed to examine the implications of different sample sizes and number of steps on data variability and statistical outcomes from kinematic and kinetics running biomechanical variables. Forty-four participants ran overground using their preferred technique (normal) and minimizing the contact sound volume (silent). Running speed, peak vertical, braking forces, and vertical average loading rate were extracted from > 40 steps/runner. Data stability was computed using a sequential estimation technique. Statistical outcomes (p values and effect sizes) from the comparison normal vs silent running were extracted from 100,000 random samples, using various combinations of sample size (from 10 to 40 runners) and number of steps (from 5 to 40 steps). The results showed that only 35% of the study sample could reach average stability using up to 10 steps across all biomechanical variables. The loading rate was consistently significantly lower during silent running compared to normal running, with large effect sizes across all combinations. However, variables presenting small or medium effect sizes (running speed and peak braking force), required > 20 runners to reach significant differences. Therefore, varying sample sizes and number of steps are shown to influence the normal vs silent running statistical outcomes in a variable-dependent manner. Based on our results, we recommend that studies involving analysis of traditional running biomechanical variables use a minimum of 25 participants and 25 steps from each participant to provide appropriate data stability and statistical power.

Download Full-text

An empirical examination of sample size effects on population demographic estimates in birds using single nucleotide polymorphism (SNP) data

10.1101/2020.03.10.986463 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jessica F. McLaughlin ◽

Kevin Winker

Keyword(s):

Sample Size ◽

Population Size ◽

Effective Population Size ◽

Study Design ◽

Population Level ◽

Small Sample ◽

Sample Sizes ◽

Effective Population ◽

Empirical Examination ◽

Population Demographic

AbstractSample size is a critical aspect of study design in population genomics research, yet few empirical studies have examined the impacts of small sample sizes. We used datasets from eight diverging bird lineages to make pairwise comparisons at different levels of taxonomic divergence (populations, subspecies, and species). Our data are from loci linked to ultraconserved elements (UCEs) and our analyses used one SNP per locus. All individuals were genotyped at all loci (McLaughlin et al. 2020). We estimated population demographic parameters (effective population size, migration rate, and time since divergence) in a coalescent framework using Diffusion Approximation for Demographic Inference (δaδi; Gutenkunst et al. 2009), an allele frequency spectrum (AFS) method. Using divergence-with-gene-flow models optimized with full datasets, we subsampled at sequentially smaller sample sizes from full datasets of 6 – 8 diploid individuals per population (with both alleles called) down to 1:1, and then we compared estimates and their changes in accuracy. Accuracy was strongly affected by sample size, with considerable differences among estimated parameters and among lineages. Effective population size parameters (ν) tended to be underestimated at low sample sizes (fewer than 3 diploid individuals per population, or 6:6 haplotypes in coalescent terms). Migration (m) was fairly consistently estimated until ≤ 2 individuals per population, and no consistent trend of over- or underestimation was found in either time since divergence (T) or Θ (4Nrefμ). Lineages that were taxonomically recognized above the population level (subspecies and species pairs; i.e., deeper divergences) tended to have lower variation in scaled root mean square error (SMRSE) of parameter estimation at smaller sample sizes than population-level divergences, and many parameters were estimated accurately down to 3 diploid individuals per population. Shallower divergence levels (i.e., populations) often required at least 5 individuals per population for reliable demographic inferences using this approach. Although divergence levels might be unknown at the outset of study design, our results provide a framework for planning appropriate sampling and for interpreting results if smaller sample sizes must be used.

Download Full-text

What can we Learn from Studies Based on Small Sample Sizes? Comment on Regan, Lakhanpal, and Anguiano (2012)

Psychological Reports ◽

10.2466/21.02.07.pr0.113x12z8 ◽

2013 ◽

Vol 113 (1) ◽

pp. 221-224 ◽

Cited By ~ 3

Author(s):

David R. Johnson ◽

Lauren K. Bachan

Keyword(s):

Sample Size ◽

A Comparison of Mixture Modeling Approaches in Latent Class Models With External Variables Under Small Samples

Educational and Psychological Measurement ◽

10.1177/0013164417726828 ◽

2017 ◽

Vol 78 (6) ◽

pp. 925-951 ◽

Cited By ~ 3

Author(s):

Unkyung No ◽

Sehee Hong

Keyword(s):

Sample Size ◽

Latent Class ◽

Latent Class Model ◽

Mixture Modeling ◽

Small Sample ◽

Outcome Variable ◽

Parameter Estimates ◽

Class Model ◽

Modeling Approaches ◽

Distal Outcome

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.

Download Full-text

Methodological Reporting Behavior, Sample Sizes, and Statistical Power in Studies of Event- Related Potentials: Barriers to Reproducibility and Replicability

10.31234/osf.io/kgv9z ◽

2019 ◽

Author(s):

Peter E Clayson ◽

Kaylie Amanda Carbine ◽

Scott Baldwin ◽

Michael J. Larson

Keyword(s):

Sample Size ◽

Statistical Power ◽

Event Related Potentials ◽

Reporting Guidelines ◽

Medium Effect ◽

Sample Sizes ◽

Reporting Behavior ◽

Average Sample Size ◽

Related Potentials ◽

Average Sample

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.

Download Full-text

Importance of sample size for estimating prevalence: A case example of infectious hematopoietic necrosis viral RNA detection in mixed-stock Fraser River Sockeye salmon (Oncorhynchus nerka), British Columbia, Canada.

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/cjfas-2020-0279 ◽

2020 ◽

Author(s):

Emilie Laurin ◽

Julia Bradshaw ◽

Laura Hawley ◽

Ian A. Gardner ◽

Kyle A Garver ◽

...

Keyword(s):

British Columbia ◽

Sample Size ◽

Sockeye Salmon ◽

Oncorhynchus Nerka ◽

Small Sample ◽

Viral Rna ◽

Sample Sizes ◽

Apparent Prevalence ◽

Infectious Hematopoietic Necrosis ◽

Mixed Stock

Proper sample size must be considered when designing infectious-agent prevalence studies for mixed-stock fisheries, because bias and uncertainty complicate interpretation of apparent (test)-prevalence estimates. Sample size varies between stocks, often smaller than expected during wild-salmonid surveys. Our case example of 2010-2016 survey data of Sockeye salmon (Oncorhynchus nerka) from different stocks of origin in British Columbia, Canada, illustrated the effect of sample size on apparent-prevalence interpretation. Molecular testing (viral RNA RT-qPCR) for infectious hematopoietic necrosis virus (IHNv) revealed large differences in apparent-prevalence across wild salmon stocks (much higher from Chilko Lake) and sampling location (freshwater or marine), indicating differences in both stock and host life-stage effects. Ten of the 13 marine non-Chilko stock-years with IHNv-positive results had small sample sizes (< 30 samples per stock-year) which, with imperfect diagnostic tests (particularly lower diagnostic sensitivity), could lead to inaccurate apparent-prevalence estimation. When calculating sample size for expected apparent prevalence using different approaches, smaller sample sizes often led to decreased confidence in apparent-prevalence results and decreased power to detect a true difference from a reference value.

Download Full-text

Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing ‐ A Monte Carlo Simulation Approach.

Computational Communication Research ◽

10.5117/ccr2021.1.003.geis ◽

2021 ◽

Vol 3 (1) ◽

pp. 61-89

Author(s):

Stefan Geiß

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Content Analysis ◽

Sample Size ◽

Effect Size ◽

Statistical Power ◽

Effect Sizes ◽

Sample Sizes ◽

Expected Effect ◽

Sample Size Effect

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.

Download Full-text

Sample Size Determination and Optimal Design of Simple Pretest-Posttest Experimental Designs: Introduction, Software, and Illustrations

10.35542/osf.io/k5ey8 ◽

2021 ◽

Author(s):

Metin Bulus

Keyword(s):

Optimal Design ◽

Sample Size ◽

Small Sample ◽

Experimental Designs ◽

Sample Size Determination ◽

Size Determination ◽

Sample Sizes ◽

Control Groups ◽

Small Sample Sizes ◽

And Control

A recent systematic review of experimental studies conducted in Turkey between 2010 and 2020 reported that small sample sizes had been a significant drawback (Bulus and Koyuncu, 2021). A small chunk of the studies were small-scale true experiments (subjects randomized into the treatment and control groups). The remaining studies consisted of quasi-experiments (subjects in treatment and control groups were matched on pretest or other covariates) and weak experiments (neither randomized nor matched but had the control group). They had an average sample size below 70 for different domains and outcomes. These small sample sizes imply a strong (and perhaps erroneous) assumption about the minimum relevant effect size (MRES) of intervention before an experiment is conducted; that is, a standardized intervention effect of Cohen’s d < 0.50 is not relevant to education policy or practice. Thus, an introduction to sample size determination for pretest-posttest simple experimental designs is warranted. This study describes nuts and bolts of sample size determination, derives expressions for optimal design under differential cost per treatment and control units, provide convenient tables to guide sample size decisions for MRES values between 0.20 ≤ Cohen’s d ≤ 0.50, and describe the relevant software along with illustrations.

Download Full-text