scholarly journals Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Marion Patxot ◽  
Daniel Trejo Banos ◽  
Athanasios Kousathanas ◽  
Etienne J. Orliac ◽  
Sven E. Ojavee ◽  
...  

AbstractWe develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.

2020 ◽  
Author(s):  
Marion Patxot ◽  
Daniel Trejo Banos ◽  
Athanasios Kousathanas ◽  
Etienne J Orliac ◽  
Sven E Ojavee ◽  
...  

Due to the complexity of linkage disequilibrium (LD) and gene regulation, understanding the genetic basis of common complex traits remains a major challenge. We develop a Bayesian model (BayesRR-RC) implemented in a hybrid-parallel algorithm that scales to whole-genome sequence data on many hundreds of thousands of individuals, taking 22 seconds per iteration to estimate the inclusion probabilities and effect sizes of 8.4 million markers and 78 SNP-heritability parameters in the UK Biobank. Unlike naive penalized regression or mixed-linear model approaches, BayesRR-RC accurately estimates annotation-specific genetic architecture, determines the underlying joint effect size distribution and provides a probabilistic determination of association within marker groups in a single step. Of the genetic variation captured for height, body mass index, cardiovascular disease, and type-2 diabetes in the UK Biobank, only ≤ 10% is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, up to 40% to intronic regions, and 22-28% to distal 10-500kb upstream regions. ≥60% of the variance contributed by these exonic, intronic and distal 10-500kb regions is underlain by many thousands of common variants, each with larger average effect sizes compared to the rest of the genome. We also find differences in the relationship between effect size and heterozygosity across annotation groups and across traits. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance for just these four traits. In the Estonian Biobank, we show improved prediction accuracy over other approaches and generate a posterior predictive distribution for each individual.


Author(s):  
Armin P. Schoech ◽  
Omer Weissbrod ◽  
Luke J. O’Connor ◽  
Nick Patterson ◽  
Huwenbo Shi ◽  
...  

AbstractMost models of complex trait genetic architecture assume that signed causal effect sizes of each SNP (defined with respect to the minor allele) are uncorrelated with those of nearby SNPs, but it is currently unknown whether this is the case. We develop a new method, autocorrelation LD regression (ACLR), for estimating the genome-wide autocorrelation of causal minor allele effect sizes as a function of genomic distance. Our method estimates these autocorrelations by regressing the products of summary statistics on distance-dependent LD scores. We determined that ACLR robustly assesses the presence or absence of nonzero autocorrelation, producing unbiased estimates with well-calibrated standard errors in null simulations regardless of genetic architecture; if true autocorrelation is nonzero, ACLR correctly detects its sign, although estimates of the autocorrelation magnitude are susceptible to bias in cases of certain genetic architectures. We applied ACLR to 31 diseases and complex traits from the UK Biobank (average N=331K), meta-analyzing results across traits. We determined that autocorrelations were significantly negative at distances of 1-50bp (P = 8 × 10−6, point estimate −0.35 ±0.08) and 50-100bp (P = 2 × 10−3, point estimate −0.33 ± 0.11). We show that the autocorrelation is primarily driven by pairs of SNPs in positive LD, which is consistent with the expectation that linked SNPs with opposite effects are less impacted by natural selection. Our findings suggest that this mechanism broadly affects complex trait genetic architectures, and we discuss implications for association mapping, heritability estimation, and genetic risk prediction.


2018 ◽  
Author(s):  
Yizhen Zhong ◽  
Minoli Perera ◽  
Eric R. Gamazon

AbstractBackgroundUnderstanding the nature of the genetic regulation of gene expression promises to advance our understanding of the genetic basis of disease. However, the methodological impact of use of local ancestry on high-dimensional omics analyses, including most prominently expression quantitative trait loci (eQTL) mapping and trait heritability estimation, in admixed populations remains critically underexplored.ResultsHere we develop a statistical framework that characterizes the relationships among the determinants of the genetic architecture of an important class of molecular traits. We estimate the trait variance explained by ancestry using local admixture relatedness between individuals. Using National Institute of General Medical Sciences (NIGMS) and Genotype-Tissue Expression (GTEx) datasets, we show that use of local ancestry can substantially improve eQTL mapping and heritability estimation and characterize the sparse versus polygenic component of gene expression in admixed and multiethnic populations respectively. Using simulations of diverse genetic architectures to estimate trait heritability and the level of confounding, we show improved accuracy given individual-level data and evaluate a summary statistics based approach. Furthermore, we provide a computationally efficient approach to local ancestry analysis in eQTL mapping while increasing control of type I and type II error over traditional approaches.ConclusionOur study has important methodological implications on genetic analysis of omics traits across a range of genomic contexts, from a single variant to a prioritized region to the entire genome. Our findings highlight the importance of using local ancestry to better characterize the heritability of complex traits and to more accurately map genetic associations.


2017 ◽  
Vol 114 (32) ◽  
pp. 8602-8607 ◽  
Author(s):  
Loic Yengo ◽  
Zhihong Zhu ◽  
Naomi R. Wray ◽  
Bruce S. Weir ◽  
Jian Yang ◽  
...  

Quantifying the effects of inbreeding is critical to characterizing the genetic architecture of complex traits. This study highlights through theory and simulations the strengths and shortcomings of three SNP-based inbreeding measures commonly used to estimate inbreeding depression (ID). We demonstrate that heterogeneity in linkage disequilibrium (LD) between causal variants and SNPs biases ID estimates, and we develop an approach to correct this bias using LD and minor allele frequency stratified inference (LDMS). We quantified ID in 25 traits measured in ∼140,000 participants of the UK Biobank, using LDMS, and confirmed previously published ID for 4 traits. We find unique evidence of ID for handgrip strength, waist/hip ratio, and visual and auditory acuity (ID between −2.3 and −5.2 phenotypic SDs for complete inbreeding; P<0.001). Our results illustrate that a careful choice of the measure of inbreeding combined with LDMS stratification improves both detection and quantification of ID using SNP data.


2016 ◽  
Author(s):  
Tian Ge ◽  
Chia-Yen Chen ◽  
Benjamin M. Neale ◽  
Mert R. Sabuncu ◽  
Jordan W. Smoller

Heritability estimation provides important information about the relative contribution of genetic and environmental factors to phenotypic variation, and provides an upper bound for the utility of genetic risk prediction models. Recent technological and statistical advances have enabled the estimation of additive heritability attributable to common genetic variants (SNP heritability) across a broad phenotypic spectrum. However, assessing the comparative heritability of multiple traits estimated in different cohorts may be misleading due to the population-specific nature of heritability. Here we report the SNP heritability for 551 complex traits derived from the large-scale, population-based UK Biobank, comprising both quantitative phenotypes and disease codes, and examine the moderating effect of three major demographic variables (age, sex and socioeconomic status) on the heritability estimates. Our study represents the first comprehensive phenome-wide heritability analysis in the UK Biobank, and underscores the importance of considering population characteristics in comparing and interpreting heritability.


Author(s):  
Soke Yuen Yong ◽  
Timothy G. Raben ◽  
Louis Lello ◽  
Stephen D.H. Hsu

AbstractGenomic prediction of complex human traits (e.g., height, cognitive ability, bone density) and disease risks (e.g., breast cancer, diabetes, heart disease, atrial fibrillation) has advanced considerably in recent years. Predictors have been constructed using penalized algorithms that favor sparsity: i.e., which use as few genetic variants as possible. We analyze the specific genetic variants (SNPs) utilized in these predictors, which can vary from dozens to as many as thirty thousand. We find that the fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits – i.e., existing PRS cannot be computed from exome data alone. We also study the fraction of SNPs and of variance that is in common between pairs of predictors. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.


2018 ◽  
Author(s):  
Guiyan Ni ◽  
Julius van der Werf ◽  
Xuan Zhou ◽  
Elina Hyppönen ◽  
Naomi R. Wray ◽  
...  

ABSTRACTThe genomics era has brought useful tools to dissect the genetic architecture of complex traits. We propose a reaction norm model (RNM) to tackle genotype-environment correlation and interaction problems in the context of genome-wide association analyses of complex traits. In our approach, an environmental risk factor affecting the trait of interest can be modeled as dependent on a continuous covariate that is itself regulated by genetic as well as environmental factors. Our multivariate RNM approach allows the joint modelling of the relation between the genotype (G) and the covariate (C), so that both their correlation (association) and interaction (effect modification) can be estimated. Hence we jointly estimate genotype-covariate correlation and interaction (GCCI). We demonstrate using simulation that the proposed multivariate RNM performs better than the current state-of-the-art methods that ignore G-C correlation. We apply the method to data from the UK Biobank (N= 66,281) in analysis of body mass index using smoking quantity as a covariate. We find a highly significant G-C correlation, but a negligible G-C interaction. In contrast, when a conventional G-C interaction analysis is applied (i.e., G-C correlation is not included in the model), highly significant G-C interaction estimates are found. It is also notable that we find a significant heterogeneity in the estimated residual variances across different covariate levels probably due to residual-covariate interaction. Using simulation we also show that the residual variances estimated by genomic restricted maximum likelihood (GREML) or linkage disequilibrium score regression (LDSC) can be inflated in the presence of interactions, implying that the currently reported SNP-heritability estimates from these methods should be interpreted with caution. We conclude that it is essential to correctly account for both interaction and correlation in complex trait analyses and that the failure to do so may lead to substantial biases in inferences relating to genetic architecture of complex traits, including estimated SNP-heritability.


Author(s):  
Brooke Sheppard ◽  
Nadav Rappoport ◽  
Po-Ru Loh ◽  
Stephan J. Sanders ◽  
Andy Dahl ◽  
...  

AbstractInteractions between genetic variants – epistasis – is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work we develop a model for structured polygenic epistasis, called Coordinated Interaction (CI), and prove that several recent theories of genetic architecture fall under the formal umbrella of CI. Unlike standard polygenic epistasis models that assume interaction and main effects are independent, in the CI model, sets of SNPs broadly interact positively or negatively, on balance skewing the penetrance of main genetic effects. To test for the existence of CI we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CI in 14 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CI is a new dimension of genetic architecture that can capture structured, systemic interactions in complex human traits.


2016 ◽  
Author(s):  
Eric R. Gamazon ◽  
Danny S. Park

Siddharth Krishna Kumar1 and co-authors claim to have shown that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” Given the numerous recent studies on the genetic architecture of complex traits that are based on this methodology, these claims have important implications for the field. Through an investigation of the stability of the likelihood function under phenotype perturbation and an analysis of its dependence on the spectral properties of the genetic relatedness matrix, our study characterizes the properties of an important approach to the analysis of GWAS data and identified crucial errors in the authors’ analyses, invalidating their main conclusions.


2021 ◽  
Vol 118 (15) ◽  
pp. e1922305118
Author(s):  
Brooke Sheppard ◽  
Nadav Rappoport ◽  
Po-Ru Loh ◽  
Stephan J. Sanders ◽  
Noah Zaitlen ◽  
...  

Interactions between genetic variants—epistasis—is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue–trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.


Sign in / Sign up

Export Citation Format

Share Document