Estimating the effective sample size in association studies of quantitative traits

AbstractThe effective sample size (ESS) is a quantity estimated in genome-wide association studies (GWAS) with related individuals and/or linear mixed models used in analysis. ESS originally measured relative power in family-based GWAS and has recently become important for correcting GWAS summary statistics in post-GWAS analyses. However, existing ESS approaches have been overlooked and based on empirical estimation. This work presents an analytical form of ESS in mixed-model GWAS of quantitative traits, which is derived using the expectation of quadratic form and validated in extensive simulations. We illustrate the performance and relevance of our ESS estimator in common GWAS scenarios and analytically show that (i) family-based studies are consistently underpowered compared to studies of unrelated individuals of the same sample size; (ii) conditioning on polygenic genetic effect by linear mixed models boosts power; and (iii) power of detecting gene-environment interaction can be substantially gained or lost in family-based designs depending on exposure distribution. We further analyze UK Biobank dataset in two samples of 336,347 unrelated and 68,910 related individuals. Analysis in unrelated individuals reveals a high accuracy of our ESS estimator compared to the existing empirical approach; and analysis of related individuals suggests that the loss in effective sample size due to relatedness is at most 0.94x. Overall, we provide an analytical form of ESS for guiding GWAS designs and processing summary statistics in post-GWAS analyses.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab057 ◽

2021 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Statistical Power ◽

Quantitative Traits ◽

Mixed Model ◽

Association Studies ◽

Effective Sample Size ◽

Environment Interaction ◽

Uk Biobank ◽

Gene Environment Interaction ◽

Gene Environment ◽

The Uk

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

Download Full-text

Sample Size for Linear Mixed Models

Linear Model Theory ◽

10.1002/9780470052143.ch23 ◽

2012 ◽

pp. 385-386

Keyword(s):

Sample Size ◽

Mixed Models ◽

Linear Mixed Models

Download Full-text

Sample size calculations for population- and family-based case-control association studies on marker genotypes

Genetic Epidemiology ◽

10.1002/gepi.10245 ◽

2003 ◽

Vol 25 (2) ◽

pp. 136-148 ◽

Cited By ~ 28

Author(s):

Ruth M. Pfeiffer ◽

Mitchell H. Gail

Keyword(s):

Sample Size ◽

Association Studies ◽

Case Control ◽

Sample Size Calculations ◽

Family Based ◽

Control Association

Download Full-text

Joint genetic analysis using variant sets reveals polygenic gene-context interactions

10.1101/097477 ◽

2016 ◽

Cited By ~ 1

Author(s):

Francesco Paolo Casale ◽

Danilo Horta ◽

Barbara Rakitsch ◽

Oliver Stegle

Keyword(s):

Genetic Analysis ◽

Mixed Models ◽

Association Studies ◽

Statistical Tests ◽

Linear Mixed Models ◽

Genetic Effects ◽

Alternative Methods ◽

Genome Wide Association Studies ◽

Lipid Levels ◽

Multiple Traits

AbstractJoint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.Author summaryGenetic effects on phenotypes can depend on external contexts, including environment. Statistical tests for identifying such interactions are important to understand how individual genetic variants may act in different contexts. Interaction effects can either be studied using measurements of a given phenotype in different contexts, under the same genetic backgrounds, or by stratifying a population into subgroups. Here, we derive a method based on linear mixed models that can be applied to both of these designs. iSet enables testing for interactions between context and sets of variants, and accounts for polygenic effects. We validate our model using simulations, before applying it to the genetic analysis of gene expression studies and genome-wide association studies of human blood lipid levels. We find that modeling interactions with variant sets offers increased power, thereby uncovering interactions that cannot be detected by alternative methods.

Download Full-text

Power and Sample Size for Fixed-Effects Inference in Reversible Linear Mixed Models

The American Statistician ◽

10.1080/00031305.2017.1415972 ◽

2018 ◽

Vol 73 (4) ◽

pp. 350-359 ◽

Cited By ~ 2

Author(s):

Yueh-Yun Chi ◽

Deborah H. Glueck ◽

Keith E. Muller

Keyword(s):

Sample Size ◽

Mixed Models ◽

Fixed Effects ◽

Linear Mixed Models

Download Full-text

Genome-Wide Control of Population Structure and Relatedness in Genetic Association Studies via Linear Mixed Models with Orthogonally Partitioned Structure

10.1101/409953 ◽

2018 ◽

Author(s):

Matthew P. Conomos ◽

Alex P. Reiner ◽

Mary Sara McPeek ◽

Timothy A. Thornton

Keyword(s):

Population Structure ◽

Genetic Association ◽

Mixed Models ◽

Association Studies ◽

Linear Mixed Models ◽

Genetic Association Studies ◽

European Ancestry ◽

Type I ◽

Genome Wide ◽

Wbc Count

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.

Download Full-text

Linear mixed models for association analysis of quantitative traits with next‐generation sequencing data

Genetic Epidemiology ◽

10.1002/gepi.22177 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chi‐yang Chiu ◽

Fang Yuan ◽

Bing‐song Zhang ◽

Ao Yuan ◽

Xin Li ◽

...

Keyword(s):

Next Generation Sequencing ◽

Association Analysis ◽

Mixed Models ◽

Quantitative Traits ◽

Linear Mixed Models ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Linear Score Tests for Variance Components in Linear Mixed Models and Applications to Genetic Association Studies

Biometrics ◽

10.1111/biom.12095 ◽

2013 ◽

Vol 69 (4) ◽

pp. 883-892 ◽

Cited By ~ 20

Author(s):

Long Qu ◽

Tobias Guennel ◽

Scott L. Marshall

Keyword(s):

Genetic Association ◽

Mixed Models ◽

Variance Components ◽

Association Studies ◽

Linear Mixed Models ◽

Genetic Association Studies ◽

Score Tests

Download Full-text

Sample size and power calculations based on generalized linear mixed models with correlated binary outcomes

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2008.03.001 ◽

2008 ◽

Vol 91 (2) ◽

pp. 122-127 ◽

Cited By ~ 23

Author(s):

Qianyu Dang ◽

Sati Mazumdar ◽

Patricia R. Houck

Keyword(s):

Sample Size ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Binary Outcomes ◽

Power Calculations ◽

Correlated Binary Outcomes

Download Full-text

Flashfm: A Flexible and Shared Information Fine-mapping Approach for Multiple Quantitative Traits

10.1101/2021.04.09.439186 ◽

2021 ◽

Author(s):

Nicolas Hernandez ◽

Jana Soenksen ◽

Paul Newcombe ◽

Manj Sandhu ◽

Ines Barroso ◽

...

Keyword(s):

Fine Mapping ◽

Quantitative Traits ◽

Causal Variant ◽

Summary Statistics ◽

Computationally Efficient ◽

Multiple Traits ◽

Improve Accuracy ◽

Shared Information ◽

Causal Variants ◽

Related Individuals

Joint fine-mapping that leverages information between quantitative traits could improve accuracy and resolution over single-trait fine-mapping. Using summary statistics, flashfm (FLexible And SHared information Fine-Mapping) fine-maps signals for multiple traits, allowing for missing trait measurements and use of related individuals. In a Bayesian framework, prior model probabilities are formulated to favour model combinations that share causal variants to capitalise on information between traits. Simulation studies demonstrate that both approaches produce broadly equivalent results when traits have no shared causal variants. When traits share at least one causal variant, flashfm reduces the number of potential causal variants by 30% compared with single-trait fine-mapping. In a Ugandan cohort with 33 cardiometabolic traits, flashfm gave a 20% reduction in the total number of potential causal variants from single-trait fine-mapping. Flashfm is computationally efficient and can easily be deployed across publicly available summary statistics for signals in up to six traits.

Download Full-text