Using Collaborative Mixed Models to Account for Imputation Uncertainty in Transcriptome-Wide Association Studies

Author(s):  
Xingjie Shi ◽  
Can Yang ◽  
Jin Liu
2016 ◽  
Author(s):  
Lana S. Martin ◽  
Eleazar Eskin

AbstractA genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.


2016 ◽  
Author(s):  
Francesco Paolo Casale ◽  
Danilo Horta ◽  
Barbara Rakitsch ◽  
Oliver Stegle

AbstractJoint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.Author summaryGenetic effects on phenotypes can depend on external contexts, including environment. Statistical tests for identifying such interactions are important to understand how individual genetic variants may act in different contexts. Interaction effects can either be studied using measurements of a given phenotype in different contexts, under the same genetic backgrounds, or by stratifying a population into subgroups. Here, we derive a method based on linear mixed models that can be applied to both of these designs. iSet enables testing for interactions between context and sets of variants, and accounts for polygenic effects. We validate our model using simulations, before applying it to the genetic analysis of gene expression studies and genome-wide association studies of human blood lipid levels. We find that modeling interactions with variant sets offers increased power, thereby uncovering interactions that cannot be detected by alternative methods.


2018 ◽  
Author(s):  
Matthew P. Conomos ◽  
Alex P. Reiner ◽  
Mary Sara McPeek ◽  
Timothy A. Thornton

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.


2002 ◽  
Vol 54 (3) ◽  
pp. 132-150 ◽  
Author(s):  
Ruzong Fan ◽  
Jeesun Jung

2017 ◽  
Author(s):  
Carl Kadie ◽  
David Heckerman

AbstractWe have developed Ludicrous Speed Linear Mixed Models, a version of FaST-LMM optimized for the cloud. The approach can perform a genome-wide association analysis on a dataset of one million SNPs across one million individuals at a cost of about 868 CPU days with an elapsed time on the order of two weeks. A Python implementation is available at https://fastlmm.github.io/.SignificanceIdentifying SNP-phenotype correlations using GWAS is difficult because effect sizes are so small for common, complex diseases. To address this issue, institutions are creating extremely large cohorts with sample sizes on the order of one million. Unfortunately, such cohorts are likely to contain confounding factors such as population structure and family/cryptic relatedness. The linear mixed model (LMM) can often correct for such confounding factors, but is too slow to use even with algebraic speedups known as FaST-LMM. We present a cloud implementation of FaST-LMM, called Ludicrous Speed LMM, that can process one million samples and one million test SNPs in a reasonable amount of time and at a reasonable cost.


Sign in / Sign up

Export Citation Format

Share Document