scholarly journals Polygenicity of complex traits is explained by negative selection

2018 ◽  
Author(s):  
Luke J. O’Connor ◽  
Armin P. Schoech ◽  
Farhad Hormozdiari ◽  
Steven Gazal ◽  
Nick Patterson ◽  
...  

Complex traits and common disease are highly polygenic: thousands of common variants are causal, and their effect sizes are almost always small. Polygenicity could be explained by negative selection, which constrains common-variant effect sizes and may reshape their distribution across the genome. We refer to this phenomenon as flattening, as genetic signal is flattened relative to the underlying biology. We introduce a mathematical definition of polygenicity, the effective number of associated SNPs, and a robust statistical method to estimate it. This definition of polygenicity differs from the number of causal SNPs, a standard definition; it depends strongly on SNPs with large effects. In analyses of 33 complex traits (average N=361k), we determined that common variants are ∼4x more polygenic than low-frequency variants, consistent with pervasive flattening. Moreover, functionally important regions of the genome have increased polygenicity in proportion to their increased heritability, implying that heritability enrichment reflects differences in the number of associations rather than their magnitude (which is constrained by selection). We conclude that negative selection constrains the genetic signal of biologically important regions and genes, reshaping genetic architecture.

2016 ◽  
Author(s):  
Steven Gazal ◽  
Hilary K. Finucane ◽  
Nicholas A Furlotte ◽  
Po-Ru Loh ◽  
Pier Francesco Palamara ◽  
...  

AbstractRecent work has hinted at the linkage disequilibrium (LD) dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability after conditioning on their minor allele frequency (MAF). However, this has not been formally assessed, quantified or biologically interpreted. Here, we analyzed summary statistics from 56 complex diseases and traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability. Roughly half of the LLD signal can be explained by functional annotations that are negatively correlated with LLD, such as DNase I hypersensitivity sites (DHS). The remaining signal is largely driven by our finding that common variants that are more recent tend to have lower LLD and to explain more heritability (P = 2.38 × 10−104); the youngest 20% of common SNPs explain 3.9x more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that these annotations jointly predict deleterious effects. Our results are consistent with the action of negative selection on deleterious variants that affect complex traits, complementing efforts to learn about negative selection by analyzing much smaller rare variant data sets.


Author(s):  
Ruth Johnson ◽  
Kathryn S. Burch ◽  
Kangcheng Hou ◽  
Mario Paciuc ◽  
Bogdan Pasaniuc ◽  
...  

AbstractA key question in human genetics is understanding the proportion of SNPs modulating a particular phenotype or the proportion of susceptibility SNPs for a disease, termed polygenicity. Previous studies have observed that complex traits tend to be highly polygenic, opposing the previous belief that only a handful of SNPs contribute to a trait. Beyond these genome-wide estimates, the distribution of polygenicity across genomic regions as well as the genomic factors that affect regional polygenicity remain poorly understood. A reason for this gap is that methods for estimating polygenicity utilize SNP effect sizes from GWAS. However, estimating regional polygenicity from GWAS effect sizes involves untangling the correlation between SNPs due to LD, leading to intractable computations for even a small number of SNPs. In this work, we propose a scalable method, BEAVR, to estimate the regional polygenicity of a trait given marginal effect sizes from GWAS and LD information. We implement a Gibbs sampler to estimate the posterior distribution of the regional polygenicity and derive a fast, algorithmic update to circumvent the computational bottlenecks associated with LD. The runtime of our algorithm is 𝒪(MK) for M SNPs and K susceptibility SNPs, where the number of susceptibility SNPs is typically K ≪ M. By modeling the full LD structure, we show that BEAVR provides unbiased estimates of polygenicity compared to previous methods that only partially model LD. Finally, we show how estimates of regional polygenicity for BMI, eczema, and high cholesterol provide insight into the regional genetic architecture of each trait.


2019 ◽  
Author(s):  
Arun Durvasula ◽  
Kirk E. Lohmueller

Accurate genetic risk prediction is a key goal for medical genetics and great progress has been made toward identifying individuals with extreme risk across several traits and diseases (Collins and Varmus, 2015). However, many of these studies are done in predominantly European populations (Bustamante et al., 2011; Popejoy and Fullerton, 2016). Although GWAS effect sizes correlate across ancestries (Wojcik et al., 2019), risk scores show substantial reductions in accuracy when applied to non-European populations (Kim et al., 2018; Martin et al., 2019; Scutari et al., 2016). We use simulations to show that human demographic history and negative selection on complex traits result in population specific genetic architectures. For traits under moderate negative selection, ~50% of the heritability can be accounted for by variants in Europe that are absent from Africa. We show that this directly leads to poor performance in risk prediction when using variants discovered in Europe to predict risk in African populations, especially in the tails of the risk distribution. To evaluate the impact of this effect in genomic data, we built a Bayesian model to stratify heritability between European-specific and shared variants and applied it to 43 traits and diseases in the UK Biobank. Across these phenotypes, we find ~50% of the heritability comes from European-specific variants, setting an upper bound on the accuracy of genetic risk prediction in non-European populations using effect sizes discovered in European populations. We conclude that genetic association studies need to include more diverse populations to enable to utility of genetic risk prediction in all populations.


2021 ◽  
Vol 58 (5) ◽  
pp. 289-296
Author(s):  
Haipeng Pang ◽  
Ying Xia ◽  
Shuoming Luo ◽  
Gan Huang ◽  
Xia Li ◽  
...  

Type 1 diabetes mellitus (T1DM) is defined as an autoimmune disorder and has enormous complexity and heterogeneity. Although its precise pathogenic mechanisms are obscure, this disease is widely acknowledged to be precipitated by environmental factors in individuals with genetic susceptibility. To date, the known susceptibility loci, which have mostly been identified by genome-wide association studies, can explain 80%–85% of the heritability of T1DM. Researchers believe that at least a part of its missing genetic component is caused by undetected rare and low-frequency variants. Most common variants have only small to modest effect sizes, which increases the difficulty of dissecting their functions and restricts their potential clinical application. Intriguingly, many studies have indicated that rare and low-frequency variants have larger effect sizes and play more significant roles in susceptibility to common diseases, including T1DM, than common variants do. Therefore, better recognition of rare and low-frequency variants is beneficial for revealing the genetic architecture of T1DM and for providing new and potent therapeutic targets for this disease. Here, we will discuss existing challenges as well as the great significance of this field and review current knowledge of the contributions of rare and low-frequency variants to T1DM.


2017 ◽  
Author(s):  
Armin P Schoech ◽  
Daniel Jordan ◽  
Po-Ru Loh ◽  
Steven Gazal ◽  
Luke O’Connor ◽  
...  

AbstractUnderstanding the role of rare variants is important in elucidating the genetic basis of human diseases and complex traits. It is widely believed that negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1−p)]α, where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α by maximizing its profile likelihood in a linear mixed model framework using imputed genotypes, including rare variants (MAF >0.07%). We applied this method to 25 UK Biobank diseases and complex traits (N = 113,851). All traits produced negative α estimates with 20 significantly negative, implying larger rare variant effect sizes. The inferred best-fit distribution of true α values across traits had mean −0.38 (s.e. 0.02) and standard deviation 0.08 (s.e. 0.03), with statistically significant heterogeneity across traits (P = 0.0014). Despite larger rare variant effect sizes, we show that for most traits analyzed, rare variants (MAF <1%) explain less than 10% of total SNP-heritability. Using evolutionary modeling and forward simulations, we validated the α model of MAF-dependent trait effects and estimated the level of coupling between fitness effects and trait effects. Based on this analysis an average genome-wide negative selection coefficient on the order of 10−4 or stronger is necessary to explain the α values that we inferred.


Author(s):  
Marina de Miguel ◽  
Isabel Rodríguez-Quilón ◽  
Myriam Heuertz ◽  
Agathe Hurel ◽  
Delphine Grivet ◽  
...  

AbstractA decade of association studies in multiple organisms suggests that most complex traits are polygenic; that is, they have a genetic architecture determined by numerous loci distributed across the genome, each with small effect-size. Thus, determining the degree of polygenicity and its variation across traits, environments and years is useful to understand the genetic basis of phenotypic variation. In this study, we applied multilocus approaches to estimate the degree of polygenicity of fitness-related traits in a long-lived plant (Pinus pinaster Ait., maritime pine) and to analyze how polygenicity changes across environments and years. To do so, we evaluated five categories of fitness-related traits (survival, height, phenology-related, functional, and biotic-stress response traits) in a clonal common garden network, planted in contrasted environments (over 12,500 trees). First, most of the analyzed traits showed evidence of local adaptation based on QST-FST comparisons. Second, we observed a remarkably stable degree of polygenicity, averaging 6% (range of 0-27%), across traits, environments and years. As previously suggested for humans, some of these traits showed also evidence of negative selection, which could explain, at least partially, the high degree of polygenicity. The observed genetic architecture of fitness-related traits in maritime pine supports the polygenic adaptation model. Because polygenic adaptation can occur rapidly, our study suggests that current predictions on the capacity of natural forest tree populations to adapt to new environments should be revised, which is of special relevance in the current context of climate change.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Téo Fournier ◽  
Omar Abou Saada ◽  
Jing Hou ◽  
Jackson Peter ◽  
Elodie Caudal ◽  
...  

Genome-wide association studies (GWAS) allow to dissect complex traits and map genetic variants, which often explain relatively little of the heritability. One potential reason is the preponderance of undetected low-frequency variants. To increase their allele frequency and assess their phenotypic impact in a population, we generated a diallel panel of 3025 yeast hybrids, derived from pairwise crosses between natural isolates and examined a large number of traits. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a third is governed by non-additive effects, with complete dominance having a key role. By performing GWAS on the diallel panel, we found that associated variants with low frequency in the initial population are overrepresented and explain a fraction of the phenotypic variance as well as an effect size similar to common variants. Overall, we highlighted the relevance of low-frequency variants on the phenotypic variation.


2019 ◽  
Author(s):  
Kevin A Hartman ◽  
Sara R Rashkin ◽  
John S Witte ◽  
Ryan D Hernandez

AbstractThe genetic architecture of complex human traits remains largely unknown. The distribution of heritability across the minor allele frequency (MAF) spectrum for a trait will be a function of the MAF of its causal variants and their effect sizes. Assumptions about these relationships underpin the tools used to estimate heritability. We examine the performance of two widely used tools, Haseman-Elston (HE) Regression and genomic-relatedness-based restricted maximum-likelihood (GREML). Our simulations show that HE is less biased than GREML under a wide variety of models and that the estimated standard error for HE tends to be substantially overestimated. We then applied HE Regression to infer the heritability of 72 quantitative biomedical traits from up to 50,000 individuals with genotype and imputation data from the UK Biobank. We found that adding each individuals’ geolocation as covariates corrected for population stratification that could not be accounted for by principal components alone (particularly for rare variants). The biomedical traits we analyzed had an average heritability of 0.27, with low frequency variants (MAF≤0.05) explaining an average of 47.7% of the total heritability (and lower frequency variants with MAF≤0.02 explaining a majority of our increased heritability over previous estimates). Variants in regions of low linkage disequilibrium (LD) accounted for 3.3-fold more heritability than the variants in regions of high LD, an effect primarily driven by low frequency variants. These findings suggest a moderate action of negative selection on the causal variants of these traits.


2018 ◽  
Vol 50 (5) ◽  
pp. 746-753 ◽  
Author(s):  
Jian Zeng ◽  
Ronald de Vlaming ◽  
Yang Wu ◽  
Matthew R. Robinson ◽  
Luke R. Lloyd-Jones ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document