scholarly journals Genome-Wide Control of Population Structure and Relatedness in Genetic Association Studies via Linear Mixed Models with Orthogonally Partitioned Structure

2018 ◽  
Author(s):  
Matthew P. Conomos ◽  
Alex P. Reiner ◽  
Mary Sara McPeek ◽  
Timothy A. Thornton

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.


2019 ◽  
Author(s):  
Yiqi Yao ◽  
Alejandro Ochoa

AbstractModern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but nowadays the Linear Mixed-Effects Model (LMM) is believed by many to be a superior model. Remarkably, previous comparisons have been limited by testing PCA without varying the number of principal components (PCs), by simulating unrealistically simple population structures, and by not always measuring both type-I error control and predictive power. In this work, we thoroughly evaluate PCA with varying number of PCs alongside LMM in various realistic scenarios, including admixture together with family structure, measuring both null p-value uniformity and the area under the precision-recall curves. We find that PCA performs as well as LMM when enough PCs are used and the sample size is large, and find a remarkable robustness to extreme number of PCs. However, we notice decreased performance for PCA relative to LMM when sample sizes are small and when there is family structure, although LMM performance is highly variable. Altogether, our work suggests that PCA is a favorable approach for association studies when sample sizes are large and no close relatives exist in the data, and a hybrid approach of LMM with PCs may be the best of both worlds.





2010 ◽  
Vol 19 (3) ◽  
pp. 347-352 ◽  
Author(s):  
Jeroen R Huyghe ◽  
Erik Fransen ◽  
Samuli Hannula ◽  
Lut Van Laer ◽  
Els Van Eyken ◽  
...  




2016 ◽  
Vol 98 (4) ◽  
pp. 653-666 ◽  
Author(s):  
Han Chen ◽  
Chaolong Wang ◽  
Matthew P. Conomos ◽  
Adrienne M. Stilp ◽  
Zilin Li ◽  
...  


2018 ◽  
Author(s):  
Hannah Verena Meyer ◽  
Francesco Paolo Casale ◽  
Oliver Stegle ◽  
Ewan Birney

AbstractGenome-wide association studies have helped to shed light on the genetic architecture of complex traits and diseases. Deep phenotyping of population cohorts is increasingly applied, where multi-to high-dimensional phenotypes are recorded in the individuals. Whilst these rich datasets provide important opportunities to analyse complex trait structures and pleiotropic effects at a genome-wide scale, existing statistical methods for joint genetic analyses are hampered by computational limitations posed by high-dimensional phenotypes. Consequently, such multivariate analyses are currently limited to a moderate number of traits. Here, we introduce a method that combines linear mixed models with bootstrapping (LiMMBo) to enable computationally efficient joint genetic analysis of high-dimensional phenotypes. Our method builds on linear mixed models, thereby providing robust control for population structure and other confounding factors, and the model scales to larger datasets with up to hundreds of phenotypes. We first validate LiMMBo using simulations, demonstrating consistent covariance estimates at greatly reduced computational cost compared to existing methods. We also find LiMMBo yields consistent power advantages compared to univariate modelling strategies, where the advantages of multivariate mapping increases substantially with the phenotype dimensionality. Finally, we applied LiMMBo to 41 yeast growth traits to map their genetic determinants, finding previously known and novel pleiotropic relationships in this high-dimensional phenotype space. LiMMBo is accessible as open source software (https://github.com/HannahVMeyer/limmbo).Author summaryIn multi-trait genetic association studies one is interested in detecting genetic variants that are associated with one or multiple traits. Genetic variants that influence two or more traits are referred to as pleiotropic. Multivariate linear mixed models have been successfully applied to detect pleiotropic effects, by jointly modelling association signals across traits. However, these models are currently limited to a moderate number of phenotypes as the number of model parameters grows steeply with the number of phenotypes, raising a computational burden. We developed LiMMBo, a new approach for the joint analysis of high-dimensional phenotypes. Our method reduces the number of effective model parameters by introducing an intermediate subsampling step. We validate this strategy using simulations, where we apply LiMMBo for the genetic analysis of hundreds of phenotypes, detecting pleiotropic effects for a wide range of simulated genetic architectures. Finally, to illustrate LiMMBo in practice, we apply the model to a study of growth traits in yeast, where we identify pleiotropic effects for traits with formerly known genetic effects as well as revealing previously unconnected traits.



2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Nicole R. Gay ◽  
◽  
Michael Gloudemans ◽  
Margaret L. Antonio ◽  
Nathan S. Abell ◽  
...  

Abstract Background Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization. Results Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up. Conclusions We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.



2019 ◽  
Author(s):  
Nicole R. Gay ◽  
Michael Gloudemans ◽  
Margaret L. Antonio ◽  
Brunilda Balliu ◽  
YoSon Park ◽  
...  

AbstractBackgroundPopulation structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the final release (v8) also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx provides an opportunity to improve portability of this research across populations and to further measure the impact of population structure on GWAS colocalization.ResultsHere, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in six tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe only 0.8% of tests with GWAS colocalization posterior probabilities that change by 10% or more. Notably, both adjustments produce similar numbers of significant colocalizations. Finally, we identify a small subset of GTEx v8 eQTL-associated variants highly correlated with local ancestry (R2 > 0.7), providing a resource to enhance functional follow-up.ConclusionsWe provide a local ancestry map for admixed individuals in the final GTEx release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.



2007 ◽  
Vol 16 (20) ◽  
pp. 2494-2505 ◽  
Author(s):  
Yasuhito Nannya ◽  
Kenjiro Taura ◽  
Mineo Kurokawa ◽  
Shigeru Chiba ◽  
Seishi Ogawa


Sign in / Sign up

Export Citation Format

Share Document