scholarly journals Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257535
Author(s):  
Max M. Owens ◽  
Alexandra Potter ◽  
Courtland S. Hyatt ◽  
Matthew Albaugh ◽  
Wesley K. Thompson ◽  
...  

Effect sizes are commonly interpreted using heuristics established by Cohen (e.g., small: r = .1, medium r = .3, large r = .5), despite mounting evidence that these guidelines are mis-calibrated to the effects typically found in psychological research. This study’s aims were to 1) describe the distribution of effect sizes across multiple instruments, 2) consider factors qualifying the effect size distribution, and 3) identify examples as benchmarks for various effect sizes. For aim one, effect size distributions were illustrated from a large, diverse sample of 9/10-year-old children. This was done by conducting Pearson’s correlations among 161 variables representing constructs from all questionnaires and tasks from the Adolescent Brain and Cognitive Development Study® baseline data. To achieve aim two, factors qualifying this distribution were tested by comparing the distributions of effect size among various modifications of the aim one analyses. These modified analytic strategies included comparisons of effect size distributions for different types of variables, for analyses using statistical thresholds, and for analyses using several covariate strategies. In aim one analyses, the median in-sample effect size was .03, and values at the first and third quartiles were .01 and .07. In aim two analyses, effects were smaller for associations across instruments, content domains, and reporters, as well as when covarying for sociodemographic factors. Effect sizes were larger when thresholding for statistical significance. In analyses intended to mimic conditions used in “real-world” analysis of ABCD data, the median in-sample effect size was .05, and values at the first and third quartiles were .03 and .09. To achieve aim three, examples for varying effect sizes are reported from the ABCD dataset as benchmarks for future work in the dataset. In summary, this report finds that empirically determined effect sizes from a notably large dataset are smaller than would be expected based on existing heuristics.

2020 ◽  
Author(s):  
Max Michael Owens ◽  
Alexandra Potter ◽  
Courtland Hyatt ◽  
Matthew Albaugh ◽  
Wesley Kurt Thompson ◽  
...  

Effect sizes are commonly interpreted using heuristics established by Cohen (e.g., small: r = .1, medium r = .3, large r = .5), despite mounting evidence that these guidelines are mis-calibrated to the effects typically found in psychological research. Here we summarize Pearson’s correlations among all questionnaire and task data from the ABCD Study® to illustrate effect size distributions in a large, diverse sample of 9/10-year-old children. The median in-sample effect size was .03, and values at the first and third quartiles were .01 and .07. Effects were smaller for associations across instruments, content domains, and reporters, as well as when covarying for sociodemographic factors. To help modify researcher expectations, we provide benchmark examples for varying effect sizes. In summary, this report finds that empirically determined effect sizes from a notably large dataset are smaller than would be expected based on existing heuristics.


2021 ◽  
Author(s):  
Kleber Neves ◽  
Pedro Batista Tan ◽  
Olavo Bohrer Amaral

Diagnostic screening models for the interpretation of null hypothesis significance test (NHST) results have been influential in highlighting the effect of selective publication on the reproducibility of the published literature, leading to John Ioannidis’ much-cited claim that most published research findings are false. These models, however, are typically based on the assumption that hypotheses are dichotomously true or false, without considering that effect sizes for different hypotheses are not the same. To address this limitation, we develop a simulation model that overcomes this by modeling effect sizes explicitly using different continuous distributions, while retaining other aspects of previous models such as publication bias and the pursuit of statistical significance. Our results show that the combination of selective publication, bias, low statistical power and unlikely hypotheses consistently leads to high proportions of false positives, irrespective of the effect size distribution assumed. Using continuous effect sizes also allows us to evaluate the degree of effect size overestimation and prevalence of estimates with the wrong signal in the literature, showing that the same factors that drive false-positive results also lead to errors in estimating effect size direction and magnitude. Nevertheless, the relative influence of these factors on different metrics varies depending on the distribution assumed for effect sizes. The model is made available as an R ShinyApp interface, allowing one to explore features of the literature in various scenarios.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


2016 ◽  
Vol 20 (4) ◽  
pp. 639-664 ◽  
Author(s):  
Christopher D. Nye ◽  
Paul R. Sackett

Moderator hypotheses involving categorical variables are prevalent in organizational and psychological research. Despite their importance, current methods of identifying and interpreting these moderation effects have several limitations that may result in misleading conclusions about their implications. This issue has been particularly salient in the literature on differential prediction where recent research has suggested that these limitations have had a significant impact on past research. To help address these issues, we propose several new effect size indices that provide additional information about categorical moderation analyses. The advantages of these indices are then illustrated in two large databases of respondents by examining categorical moderation in the prediction of psychological well-being and the extent of differential prediction in a large sample of job incumbents.


1990 ◽  
Vol 24 (3) ◽  
pp. 405-415 ◽  
Author(s):  
Nathaniel McConaghy

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.


2005 ◽  
Vol 77 (1) ◽  
pp. 45-76 ◽  
Author(s):  
Lee-Ann C. Hayek ◽  
W. Ronald Heyer

Several analytic techniques have been used to determine sexual dimorphism in vertebrate morphological measurement data with no emergent consensus on which technique is superior. A further confounding problem for frog data is the existence of considerable measurement error. To determine dimorphism, we examine a single hypothesis (Ho = equal means) for two groups (females and males). We demonstrate that frog measurement data meet assumptions for clearly defined statistical hypothesis testing with statistical linear models rather than those of exploratory multivariate techniques such as principal components, correlation or correspondence analysis. In order to distinguish biological from statistical significance of hypotheses, we propose a new protocol that incorporates measurement error and effect size. Measurement error is evaluated with a novel measurement error index. Effect size, widely used in the behavioral sciences and in meta-analysis studies in biology, proves to be the most useful single metric to evaluate whether statistically significant results are biologically meaningful. Definitions for a range of small, medium, and large effect sizes specifically for frog measurement data are provided. Examples with measurement data for species of the frog genus Leptodactylus are presented. The new protocol is recommended not only to evaluate sexual dimorphism for frog data but for any animal measurement data for which the measurement error index and observed or a priori effect sizes can be calculated.


2013 ◽  
Vol 11 (5) ◽  
pp. 147470491301100 ◽  
Author(s):  
Marco Del Giudice

In the study of group and sex differences in multivariate domains such as personality and aggression, univariate effect sizes may underestimate the extent to which groups differ from one another. When multivariate effect sizes such as Mahalanobis D are employed, sex differences are often found to be considerably larger than commonly assumed. In this paper, I review and discuss recent criticism concerning the validity of D as an effect size in psychological research. I conclude that the main arguments against D are incorrect, logically inconsistent, or easily answered on methodological grounds. When correctly employed and interpreted, D provides a valid, convenient measure of group and sex differences in multivariate domains.


2019 ◽  
Author(s):  
Miguel Alejandro Silan

One of the main criticisms of NHST is that statistical significance is not practical significance. And this evaluation of the practical significance of effects often take an implicit but consequential form in the field: from informal conversations among researchers when evaluating findings, to peer reviewers deciding the importance of an article. This primer seeks to make explicit what we mean when we talk about practical significance, organize what we know of it, and assert a framework for how we can evaluate and establish it. The practical significance of effects is appraised by analyzing (i.) along different levels of analysis, (ii.) across different outcomes, (iii.) across time and (iv.) across relevant moderators; which also underlie the conditions of when small effect sizes can be consequential. Practical significance is contrasted with often conflated terms including statistical significance, effect size and effect size benchmarks as well as theoretical significance. Promising directions are then presented.


2020 ◽  
Author(s):  
Luke Jen O’Connor

AbstractThe genetic effect-size distribution describes the number of variants that affect disease risk and the range of their effect sizes. Accurate estimates of this distribution would provide insights into genetic architecture and set sample-size targets for future genome-wide association studies. We developed Fourier Mixture Regression (FMR) to estimate common-variant effect-size distributions from GWAS summary statistics. We validated FMR in simulations and in analyses of UK Biobank data, using interim-release summary statistics (max N=145k) to predict the results of the full release (N=460k). Analyzing summary statistics for 10 diseases (avg Neff=169k) and 22 other traits, we estimated the sample size required for genome-wide significant SNPs to explain 50% of SNP-heritability. For most diseases the requisite number of cases is 100k-1M, an attainable number; ten times more would be required to explain 90% of heritability. In well-powered GWAS, genome-wide significance is a conservative threshold, and loci at less stringent thresholds have true positive rates that remain close to 1 if confounding is controlled. Analyzing the shape of the effect-size distribution, we estimate that heritability accumulates across many thousands of SNPs with a wide range of effect sizes: the largest effects (at the 90th percentile of heritability) are 100 times larger than the smallest (10th percentile), and while the midpoint of this range varies across traits, its size is similar. These results suggest attainable sample size targets for future GWAS, and they underscore the complexity of genetic architecture.


Sign in / Sign up

Export Citation Format

Share Document