Estimating the effective sample size in association studies of quantitative traits
AbstractThe effective sample size (ESS) is a quantity estimated in genome-wide association studies (GWAS) with related individuals and/or linear mixed models used in analysis. ESS originally measured relative power in family-based GWAS and has recently become important for correcting GWAS summary statistics in post-GWAS analyses. However, existing ESS approaches have been overlooked and based on empirical estimation. This work presents an analytical form of ESS in mixed-model GWAS of quantitative traits, which is derived using the expectation of quadratic form and validated in extensive simulations. We illustrate the performance and relevance of our ESS estimator in common GWAS scenarios and analytically show that (i) family-based studies are consistently underpowered compared to studies of unrelated individuals of the same sample size; (ii) conditioning on polygenic genetic effect by linear mixed models boosts power; and (iii) power of detecting gene-environment interaction can be substantially gained or lost in family-based designs depending on exposure distribution. We further analyze UK Biobank dataset in two samples of 336,347 unrelated and 68,910 related individuals. Analysis in unrelated individuals reveals a high accuracy of our ESS estimator compared to the existing empirical approach; and analysis of related individuals suggests that the loss in effective sample size due to relatedness is at most 0.94x. Overall, we provide an analytical form of ESS for guiding GWAS designs and processing summary statistics in post-GWAS analyses.