Interpreting published effect sizes in behavioral science: a thought-experiment
Standardized effect size measures (e.g., Cohen’s d) state the observed mean difference, m1-m0, relative to the observed standard deviation, s. These measures are commonly used in behavioral science today in meta-analytical research to quantify the observed m1-m0 across object-level studies that use different measurement-scales, as well as in theory-construction research to point-specify m1-m0 as a theoretically predicted parameter. Since standardization conceptually relates to the quality of measurement, m1-m0 can be interpreted fully only relative to whichever error-theory determines s. The error-theory, however, is what behavioral scientists must typically choose freely, because a theoretically motivated measurement-scale is normally unavailable. Using a thought-experiment, we show that differentially sophisticated error-theories let the observed m1-m0 vary massively given identical observations. This lets the common praxis of publishing m1-m0 “nakedly”—without a transparent error-theory—appear problematic, because it undermines the goals of a cumulative science of human behavior. We advocate reporting standardized effect sizes along with a transparent error-theory.