scholarly journals Genetic architecture of gene expression traits across diverse populations

2018 ◽  
Author(s):  
Lauren S Mogil ◽  
Angela Andaleon ◽  
Alexa Badalamenti ◽  
Scott P Dickinson ◽  
Xiuqing Guo ◽  
...  

For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n=233), Hispanic (HIS, n=352), and European (CAU, n=578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene, TACSTD2 , was the same across populations with R 2 > 0.86 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.

2019 ◽  
Author(s):  
Anna Mikhaylova ◽  
Timothy Thornton

AbstractPredicting gene expression with genetic data has garnered significant attention in recent years. PrediXcan is one of the most widely used gene-based association methods for testing imputed gene expression values with a phenotype due to the invaluable insight the method has shown into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The prediction models for PrediXcan, however, were obtained using supervised machine learning methods and training data from the Depression and Gene Network (DGN) and the Genotype-Tissue Expression (GTEx) data, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we assess the accuracy of gene expression predictions with PrediXcan in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Health and Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European populations. Prediction results are obtained using a range of models from PrediXcan weight databases, and Pearson’s correlation coefficient is used to measure prediction accuracy. We demonstrate that the predictive performance of PrediXcan varies across populations (F-test p-value < 0.001), where prediction accuracy is the worst in the Yoruban sample compared to European samples. Moreover, the performance of PrediXcan varies not only among distant populations, but also among closely related populations as well. We also find that the qualitative performance of PrediXcan for the populations considered is consistent across all weight databases used.


2018 ◽  
Author(s):  
Yizhen Zhong ◽  
Minoli Perera ◽  
Eric R. Gamazon

AbstractBackgroundUnderstanding the nature of the genetic regulation of gene expression promises to advance our understanding of the genetic basis of disease. However, the methodological impact of use of local ancestry on high-dimensional omics analyses, including most prominently expression quantitative trait loci (eQTL) mapping and trait heritability estimation, in admixed populations remains critically underexplored.ResultsHere we develop a statistical framework that characterizes the relationships among the determinants of the genetic architecture of an important class of molecular traits. We estimate the trait variance explained by ancestry using local admixture relatedness between individuals. Using National Institute of General Medical Sciences (NIGMS) and Genotype-Tissue Expression (GTEx) datasets, we show that use of local ancestry can substantially improve eQTL mapping and heritability estimation and characterize the sparse versus polygenic component of gene expression in admixed and multiethnic populations respectively. Using simulations of diverse genetic architectures to estimate trait heritability and the level of confounding, we show improved accuracy given individual-level data and evaluate a summary statistics based approach. Furthermore, we provide a computationally efficient approach to local ancestry analysis in eQTL mapping while increasing control of type I and type II error over traditional approaches.ConclusionOur study has important methodological implications on genetic analysis of omics traits across a range of genomic contexts, from a single variant to a prioritized region to the entire genome. Our findings highlight the importance of using local ancestry to better characterize the heritability of complex traits and to more accurately map genetic associations.


2018 ◽  
Author(s):  
Sini Nagpal ◽  
Xiaoran Meng ◽  
Michael P. Epstein ◽  
Lam C. Tsoi ◽  
Matthew Patrick ◽  
...  

AbstractThe transcriptome-wide association studies (TWAS) that test for association between the study trait and the imputed gene expression levels from cis-acting expression quantitative trait loci (cis-eQTL) genotypes have successfully enhanced the discovery of genetic risk loci for complex traits. By using the gene expression imputation models fitted from reference datasets that have both genetic and transcriptomic data, TWAS facilitates gene-based tests with GWAS data while accounting for the reference transcriptomic data. The existing TWAS tools like PrediXcan and FUSION use parametric imputation models that have limitations for modeling the complex genetic architecture of transcriptomic data. Therefore, we propose an improved Bayesian method that assumes a data-driven nonparametric prior to impute gene expression. Our method is general and flexible and includes both the parametric imputation models used by PrediXcan and FUSION as special cases. Our simulation studies showed that the nonparametric Bayesian model improved both imputation R2 for transcriptomic data and the TWAS power over PrediXcan. In real applications, our nonparametric Bayesian method fitted transcriptomic imputation models for 2X number of genes with 1.7X average regression R2 over PrediXcan, thus improving the power of follow-up TWAS. Hence, the nonparametric Bayesian model is preferred for modeling the complex genetic architecture of transcriptomes and is expected to enhance transcriptome-integrated genetic association studies. We implement our Bayesian approach in a convenient software tool “TIGAR” (Transcriptome-Integrated Genetic Association Resource), which imputes transcriptomic data and performs subsequent TWAS using individual-level or summary-level GWAS data.


2019 ◽  
Author(s):  
Huwenbo Shi ◽  
Kathryn S. Burch ◽  
Ruth Johnson ◽  
Malika K. Freund ◽  
Gleb Kichaev ◽  
...  

AbstractDespite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze 9 complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8x enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWAS due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.


Author(s):  
Ronald I. Clyman ◽  
Nancy K. Hills ◽  
John M. Dagle ◽  
Jeffrey C. Murray ◽  
Keegan Kelsey

Abstract Background DNA polymorphisms in PTGIS and TFAP2B have been identified as risk factors for patent ductus arteriosus (PDA) in a population composed of preterm infants with European genetic ancestry but not in more genetically diverse populations. Goal To determine if the effects of TFAP2B and PTGIS polymorphisms on ductus arteriosus (DA) gene expression differ based on genetic ancestry. Methods DA from 273 human second trimester fetuses were genotyped for TFAP2B and PTGIS polymorphisms and for polymorphisms distributing along genetic ancestry lines. RT-PCR was used to measure the RNA expression of 49 candidate genes involved with DA closure. Results Seventeen percent of the DA analyzed were of European ancestry. In multivariable regression analyses we found consistent associations between four PDA-related TFAP2B polymorphisms (rs2817399(A), rs987237(G), rs760900(C), and rs2817416(C)) and expression of the following genes: EPAS1, CACNB2, ECE1, KCNA2, ATP2A3, EDNRA, EDNRB, BMP9, and BMP10, and between the PTGIS haplotype rs493694(G)/rs693649(A) and PTGIS and NOS3. These changes only occurred in DA with European ancestry. No consistent positive or negative associations were found among DA samples unless an interaction between the polymorphisms and genetic ancestry was taken into account. Conclusion PTGIS and TFAP2B polymorphisms were associated with consistent changes in DA gene expression when present in fetuses with European ancestry. Impact DNA polymorphisms in PTGIS and TFAP2B have been identified as risk factors for patent ductus arteriosus (PDA) in a population composed primarily of preterm infants with European genetic ancestry but not in more genetically diverse populations. The same PTGIS and TFAP2B polymorphisms are associated with changes in ductus gene expression when present in ductus from fetuses with European genetic ancestry. No consistent associations with gene expression can be found unless an interaction between the polymorphisms and genetic ancestry is taken into account.


2021 ◽  
Author(s):  
Roshni A. Patel ◽  
Shaila A. Musharoff ◽  
Jeffrey P. Spence ◽  
Harold Pimentel ◽  
Catherine Tcheandjieu ◽  
...  

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.


2020 ◽  
Author(s):  
Elena Bernabeu ◽  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
Andrea Talenti ◽  
James Prendergast ◽  
...  

ABSTRACTSex is arguably the most important differentiating characteristic in most mammalian species, separating populations into different groups, with varying behaviors, morphologies, and physiologies based on their complement of sex chromosomes. In humans, despite males and females sharing nearly identical genomes, there are differences between the sexes in complex traits and in the risk of a wide array of diseases. Gene by sex interactions (GxS) are thought to account for some of this sexual dimorphism. However, the extent and basis of these interactions are poorly understood.Here we provide insights into both the scope and mechanism of GxS across the genome of circa 450,000 individuals of European ancestry and 530 complex traits in the UK Biobank. We found small yet widespread differences in genetic architecture across traits through the calculation of sex-specific heritability, genetic correlations, and sex-stratified genome-wide association studies (GWAS). We also found that, in some cases, sex-agnostic GWAS efforts might be missing loci of interest, and looked into possible improvements in the prediction of high-level phenotypes. Finally, we studied the potential functional role of the dimorphism observed through sex-biased eQTL and gene-level analyses.This study marks a broad examination of the genetics of sexual dimorphism. Our findings parallel previous reports, suggesting the presence of sexual genetic heterogeneity across complex traits of generally modest magnitude. Our results suggest the need to consider sex-stratified analyses for future studies in order to shed light into possible sex-specific molecular mechanisms.


Author(s):  
Chengran Yang ◽  
Fabiana G. Farias ◽  
Laura Ibanez ◽  
Brooke Sadler ◽  
Maria Victoria Fernandez ◽  
...  

AbstractExpression quantitative trait loci (eQTL) mapping has successfully resolved some genome-wide association study (GWAS) loci for complex traits1–6. However, there is a need for implementing additional “omic” approaches to untangle additional loci and provide a biological context for GWAS signals. We generated a detailed landscape of the genomic architecture of protein levels in multiple neurologically relevant tissues (brain, cerebrospinal fluid (CSF) and plasma), by profiling thousands of proteins in a large and well-characterized cohort. We identified 274, 127 and 32 protein quantitative loci (pQTL) for CSF, plasma and brain respectively. We demonstrated that cis-pQTL are more likely to be shared across tissues but trans-pQTL are tissue-specific. Between 78% to 87% of pQTL are not eQTL, indicating that protein levels have a different genetic architecture than gene expression. By combining our pQTL with Mendelian Randomization approaches we identified potential novel biomarkers and drug targets for neurodegenerative diseases including Alzheimer disease and frontotemporal dementia. In the context of personalized medicine, these results highlight the need for implementing additional functional genomic approaches beyond gene expression in order to understand the biology of complex traits, and to identify novel biomarkers and potential drug targets for those traits.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Melissa L Spear ◽  
Alex Diaz-Papkovich ◽  
Elad Ziv ◽  
Joseph M Yracheta ◽  
Simon Gravel ◽  
...  

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.


2021 ◽  
Author(s):  
Angel C.Y. Mak ◽  
Linda Kachuri ◽  
Donglei Hu ◽  
Celeste Eng ◽  
Scott Huntsman ◽  
...  

We explored the role of genetic ancestry in shaping the genetic architecture of whole blood gene expression using whole genome and RNA sequencing data from 2,733 African American and Hispanic/Latino children. We find that heritability of gene expression significantly increases with greater proportion of genome-wide African ancestry and decreases with higher levels of Indigenous American ancestry. Fine-mapping of expression quantitative trait loci (eQTLs) in individuals with predominantly African or Indigenous American ancestry revealed ancestry-specific eQTLs in over 30% of heritable genes. We leveraged our data to train genetically derived transcriptome prediction models, which identified significantly more associated genes when applied to 28 traits from a multi-ancestry population. Our findings underscore the importance of increasing representation from ancestrally diverse populations in genomic studies to enable new discoveries and ensure their equitable translation.


Sign in / Sign up

Export Citation Format

Share Document