scholarly journals Estimating the effective sample size in association studies of quantitative traits

Author(s):  
Andrey Ziyatdinov ◽  
Jihye Kim ◽  
Dmitry Prokopenko ◽  
Florian Privé ◽  
Fabien Laporte ◽  
...  

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

2017 ◽  
Author(s):  
Po-Ru Loh ◽  
Gleb Kichaev ◽  
Steven Gazal ◽  
Armin P Schoech ◽  
Alkes L Price

Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association method— capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples—a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers.


2021 ◽  
Author(s):  
Julian Hecker ◽  
Dmitry Prokopenko ◽  
Matthew Moll ◽  
Sanghun Lee ◽  
Wonji Kim ◽  
...  

AbstractThe identification and understanding of gene-environment interactions can provide insights into the pathways and mechanisms underlying complex diseases. However, testing for gene-environment interaction remains a challenge since statistical power is often limited, the specification of environmental effects is nontrivial, and such misspecifications can lead to false positive findings. To address the lack of statistical power, recent methods aim to identify interactions on an aggregated level using, for example, polygenic risk scores. While this strategy increases power to detect interactions, identifying contributing key genes and pathways is difficult based on these global results.Here, we propose RITSS (Robust Interaction Testing using Sample Splitting), a gene-environment interaction testing framework for quantitative traits that is based on sample splitting and robust test statistics. RITSS can incorporate multiple genetic variants and/or multiple environmental factors. Using sample splitting, a screening step enables the selection and combination of potential interactions into scores with improved interpretability, based on the user’s unrestricted choices for statistical/machine learning approaches. In the testing step, the application of robust test statistics minimizes the susceptibility of the results to main effect misspecifications.Using extensive simulation studies, we demonstrate that RITSS controls the type 1 error rate in a wide range of scenarios. In an application to lung function phenotypes and human height in the UK Biobank, RITSS identified genome-wide significant interactions with subcomponents of genetic risk scores. While the contributing single variant interactions are moderate, our analysis results indicate interesting interaction patterns that result in strong aggregated signals that provide further insights into gene-environment interaction mechanisms.


2011 ◽  
Vol 38 (3) ◽  
pp. 564-566 ◽  
Author(s):  
PROTON RAHMAN

Psoriasis and psoriatic arthritis (PsA) are heterogeneous diseases. While both have a strong genetic basis, it is strongest for PsA, where fewer investigators are studying its genetics. Over the last year the number of independent genetic loci associated with psoriasis has substantially increased, mostly due to completion of multiple genome-wide association studies (GWAS) in psoriasis. At least 2 GWAS efforts are now under way in PsA to identify novel genes in this disease; a metaanalysis of genome-wide scans and further studies must follow to examine the genetics of disease expression, epistatic interaction, and gene-environment interaction. In the long term, it is anticipated that genome-wide sequencing is likely to generate another wave of novel genes in PsA. At the annual meeting of the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) in Stockholm, Sweden, in 2009, members discussed issues and challenges regarding the advancement of the genetics of PsA; results of those discussions are summarized here.


Author(s):  
Mike Schmidt ◽  
Elizabeth R Hauser ◽  
Eden R. Martin ◽  
Silke Schmidt

We have previously distributed a software package, SIMLA (SIMulation of Linkage and Association), which can be used to generate disease phenotype and marker genotype data in three-generational pedigrees of user-specified structure. To our knowledge, SIMLA is the only publicly available program that can simulate variable levels of both linkage (recombination) and linkage disequilibrium (LD) between marker and disease loci in general pedigrees. While the previous SIMLA version provided flexibility in choosing many parameters relevant for linkage and association mapping of complex human diseases, it did not allow for the segregation of more than one disease locus in a given pedigree and did not incorporate environmental covariates possibly interacting with disease susceptibility genes.Here, we present an extension of the simulation algorithm characterized by a much more general penetrance function, which allows for the joint action of up to two genes and up to two environmental covariates in the simulated pedigrees, with all possible multiplicative interaction effects between them. This makes the program even more useful for comparing the performance of different linkage and association analysis methods applied to complex human phenotypes. SIMLA can assist investigators in planning and designing a variety of linkage and association studies, and can help interpret results of real data analyses by comparing them to results obtained under a user-controlled data generation mechanism.A free download of the SIMLA package is available at http://wwwchg.duhs.duke.edu/software.


2020 ◽  
Author(s):  
Arunabha Majumdar ◽  
Kathryn S. Burch ◽  
Sriram Sankararaman ◽  
Bogdan Pasaniuc ◽  
W. James Gauderman ◽  
...  

AbstractWhile gene-environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e., GxE pleiotropy), our approach offers substantial gain in power (18% – 43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two independent genome-wide significant signals of an overall GxE effect on the vector of lipids.


Sign in / Sign up

Export Citation Format

Share Document