scholarly journals Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx

2019 ◽  
Author(s):  
Nicole R. Gay ◽  
Michael Gloudemans ◽  
Margaret L. Antonio ◽  
Brunilda Balliu ◽  
YoSon Park ◽  
...  

AbstractBackgroundPopulation structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the final release (v8) also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx provides an opportunity to improve portability of this research across populations and to further measure the impact of population structure on GWAS colocalization.ResultsHere, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in six tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe only 0.8% of tests with GWAS colocalization posterior probabilities that change by 10% or more. Notably, both adjustments produce similar numbers of significant colocalizations. Finally, we identify a small subset of GTEx v8 eQTL-associated variants highly correlated with local ancestry (R2 > 0.7), providing a resource to enhance functional follow-up.ConclusionsWe provide a local ancestry map for admixed individuals in the final GTEx release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Nicole R. Gay ◽  
◽  
Michael Gloudemans ◽  
Margaret L. Antonio ◽  
Nathan S. Abell ◽  
...  

Abstract Background Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization. Results Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up. Conclusions We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.


2018 ◽  
Author(s):  
Matthew P. Conomos ◽  
Alex P. Reiner ◽  
Mary Sara McPeek ◽  
Timothy A. Thornton

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.


2020 ◽  
Author(s):  
Arvind Kumar ◽  
Daniel Mas Montserrat ◽  
Carlos Bustamante ◽  
Alexander Ioannidis

AbstractGenomic medicine promises increased resolution for accurate diagnosis, for personalized treatment, and for identification of population-wide health burdens at rapidly decreasing cost (with a genotype now cheaper than an MRI and dropping). The benefits of this emerging form of affordable, data-driven medicine will accrue predominantly to those populations whose genetic associations have been mapped, so it is of increasing concern that over 80% of such genome-wide association studies (GWAS) have been conducted solely within individuals of European ancestry [1]. The severe under-representation of the majority of the world’s populations in genetic association studies stems in part from an addressable algorithmic weakness: lack of simple, accurate, and easily trained methods for identifying and annotating ancestry along the genome (local ancestry). Here we present such a method (XGMix) based on gradient boosted trees, which, while being accurate, is also simple to use, and fast to train, taking minutes on consumer-level laptops.


2020 ◽  
Vol 29 (16) ◽  
pp. 2803-2811
Author(s):  
James P Cook ◽  
Anubha Mahajan ◽  
Andrew P Morris

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.


2010 ◽  
Vol 19 (3) ◽  
pp. 347-352 ◽  
Author(s):  
Jeroen R Huyghe ◽  
Erik Fransen ◽  
Samuli Hannula ◽  
Lut Van Laer ◽  
Els Van Eyken ◽  
...  

2019 ◽  
Vol 8 (5) ◽  
pp. 692
Author(s):  
Eun Pyo Hong ◽  
Bong Jun Kim ◽  
Jin Pyeong Jeon

Previous genome-wide association studies did not show a consistent association between the BOLL gene (rs700651, 2q33.1) and intracranial aneurysm (IA) susceptibility. We aimed to perform an updated meta-analysis for the potential IA-susceptibility locus in large-scale multi-ethnic populations. We conducted a systematic review of studies identified by an electronic search from January 1990 to March 2019. The overall estimates of the “G” allele of rs700651, indicating IA susceptibility, were calculated under the fixed- and random-effect models using the inverse-variance method. Subsequent in silico function and cis-expression quantitative trait loci (cis-eQTL) analyses were performed to evaluate biological functions and genotype-specific expressions in human tissues. We included 4513 IA patients and 13,506 controls from five studies with seven independent populations: three European-ancestry, three Japanese, and one Korean population. The overall result showed a genome-wide significance threshold between rs700651 and IA susceptibility after controlling for study heterogeneity (OR = 1.213, 95% CI: 1.135–1.296). Subsequent cis-eQTL analysis showed significant genome-wide expressions in three human tissues, i.e., testis (p = 8.04 × 10−15 for ANKRD44), tibial nerves (p = 3.18 × 10−10 for SF3B1), and thyroid glands (p = 4.61 × 10−9 for SF3B1). The rs700651 common variant of the 2q33.1 region may be involved in genetic mechanisms that increase the risk of IA and may play crucial roles in regulatory functions.


2020 ◽  
Author(s):  
Meiyue Wang ◽  
Gary Peltz

AbstractPopulation structure (PS) has been shown to cause false positive signals in genome-wide association studies (GWAS). Since PS correction is routinely used in human GWAS, it was assumed that it should be utilized for murine GWAS. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been thoroughly investigated. We examined 8223 datasets characterizing biomedical responses in panels of inbred mouse strains to assess the impact of PS on murine GWAS. Surprisingly, we found that PS had a minimal impact on datasets characterizing responses in ≤20 strains; and relatively little impact on the majority of datasets characterizing >20 strains. Moreover, there were examples where association signals within known causative genes could be rejected if PS correction methods were utilized. PS assessment should be carefully used, and considered in conjunction with other criteria, for assessing the candidate genes that are identified in murine GWAS.


2021 ◽  
Vol 12 ◽  
Author(s):  
Meiyue Wang ◽  
Zhuoqing Fang ◽  
Boyoung Yoo ◽  
Gill Bejerano ◽  
Gary Peltz

The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 – 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Hang-Rai Kim ◽  
Sang-Hyuk Jung ◽  
Jaeho Kim ◽  
Hyemin Jang ◽  
Sung Hoon Kang ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have identified a number of genetic variants for Alzheimer’s disease (AD). However, most GWAS were conducted in individuals of European ancestry, and non-European populations are still underrepresented in genetic discovery efforts. Here, we performed GWAS to identify single nucleotide polymorphisms (SNPs) associated with amyloid β (Aβ) positivity using a large sample of Korean population. Methods One thousand four hundred seventy-four participants of Korean ancestry were recruited from multicenters in South Korea. Discovery dataset consisted of 1190 participants (383 with cognitively unimpaired [CU], 330 with amnestic mild cognitive impairment [aMCI], and 477 with AD dementia [ADD]) and replication dataset consisted of 284 participants (46 with CU, 167 with aMCI, and 71 with ADD). GWAS was conducted to identify SNPs associated with Aβ positivity (measured by amyloid positron emission tomography). Aβ prediction models were developed using the identified SNPs. Furthermore, bioinformatics analysis was conducted for the identified SNPs. Results In addition to APOE, we identified nine SNPs on chromosome 7, which were associated with a decreased risk of Aβ positivity at a genome-wide suggestive level. Of these nine SNPs, four novel SNPs (rs73375428, rs2903923, rs3828947, and rs11983537) were associated with a decreased risk of Aβ positivity (p < 0.05) in the replication dataset. In a meta-analysis, two SNPs (rs7337542 and rs2903923) reached a genome-wide significant level (p < 5.0 × 10−8). Prediction performance for Aβ positivity increased when rs73375428 were incorporated (area under curve = 0.75; 95% CI = 0.74–0.76) in addition to clinical factors and APOE genotype. Cis-eQTL analysis demonstrated that the rs73375428 was associated with decreased expression levels of FGL2 in the brain. Conclusion The novel genetic variants associated with FGL2 decreased risk of Aβ positivity in the Korean population. This finding may provide a candidate therapeutic target for AD, highlighting the importance of genetic studies in diverse populations.


Sign in / Sign up

Export Citation Format

Share Document