scholarly journals Linkage disequilibrium dependent architecture of human complex traits reveals action of negative selection

2016 ◽  
Author(s):  
Steven Gazal ◽  
Hilary K. Finucane ◽  
Nicholas A Furlotte ◽  
Po-Ru Loh ◽  
Pier Francesco Palamara ◽  
...  

AbstractRecent work has hinted at the linkage disequilibrium (LD) dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability after conditioning on their minor allele frequency (MAF). However, this has not been formally assessed, quantified or biologically interpreted. Here, we analyzed summary statistics from 56 complex diseases and traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability. Roughly half of the LLD signal can be explained by functional annotations that are negatively correlated with LLD, such as DNase I hypersensitivity sites (DHS). The remaining signal is largely driven by our finding that common variants that are more recent tend to have lower LLD and to explain more heritability (P = 2.38 × 10−104); the youngest 20% of common SNPs explain 3.9x more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that these annotations jointly predict deleterious effects. Our results are consistent with the action of negative selection on deleterious variants that affect complex traits, complementing efforts to learn about negative selection by analyzing much smaller rare variant data sets.

2017 ◽  
Vol 49 (10) ◽  
pp. 1421-1427 ◽  
Author(s):  
Steven Gazal ◽  
Hilary K Finucane ◽  
Nicholas A Furlotte ◽  
Po-Ru Loh ◽  
Pier Francesco Palamara ◽  
...  

2019 ◽  
Vol 51 (8) ◽  
pp. 1295-1295
Author(s):  
Steven Gazal ◽  
Hilary K. Finucane ◽  
Nicholas A. Furlotte ◽  
Po-Ru Loh ◽  
Pier Francesco Palamara ◽  
...  

2018 ◽  
Author(s):  
Luke J. O’Connor ◽  
Armin P. Schoech ◽  
Farhad Hormozdiari ◽  
Steven Gazal ◽  
Nick Patterson ◽  
...  

Complex traits and common disease are highly polygenic: thousands of common variants are causal, and their effect sizes are almost always small. Polygenicity could be explained by negative selection, which constrains common-variant effect sizes and may reshape their distribution across the genome. We refer to this phenomenon as flattening, as genetic signal is flattened relative to the underlying biology. We introduce a mathematical definition of polygenicity, the effective number of associated SNPs, and a robust statistical method to estimate it. This definition of polygenicity differs from the number of causal SNPs, a standard definition; it depends strongly on SNPs with large effects. In analyses of 33 complex traits (average N=361k), we determined that common variants are ∼4x more polygenic than low-frequency variants, consistent with pervasive flattening. Moreover, functionally important regions of the genome have increased polygenicity in proportion to their increased heritability, implying that heritability enrichment reflects differences in the number of associations rather than their magnitude (which is constrained by selection). We conclude that negative selection constrains the genetic signal of biologically important regions and genes, reshaping genetic architecture.


2018 ◽  
Author(s):  
Corbin Quick ◽  
Christian Fuchsberger ◽  
Daniel Taliun ◽  
Gonçalo Abecasis ◽  
Michael Boehnke ◽  
...  

AbstractSummaryEstimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies (GWAS). Large genetic data sets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD orders of magnitude faster than existing tools.Availability and ImplementationemeraLD is implemented in C++, and is open source under GPLv3. Source code, documentation, an R interface, and utilities for analysis of summary statistics are freely available at http://github.com/statgen/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kevin J. Gleason ◽  
Fan Yang ◽  
Brandon L. Pierce ◽  
Xin He ◽  
Lin S. Chen

Abstract To provide a comprehensive mechanistic interpretation of how known trait-associated SNPs affect complex traits, we propose a method, Primo, for integrative analysis of GWAS summary statistics with multiple sets of omics QTL summary statistics from different cellular conditions or studies. Primo examines association patterns of SNPs to complex and omics traits. In gene regions harboring known susceptibility loci, Primo performs conditional association analysis to account for linkage disequilibrium. Primo allows for unknown study heterogeneity and sample correlations. We show two applications using Primo to examine the molecular mechanisms of known susceptibility loci and to detect and interpret pleiotropic effects.


2020 ◽  
Author(s):  
Shadi Zabad ◽  
Aaron P. Ragsdale ◽  
Rosie Sun ◽  
Yue Li ◽  
Simon Gravel

AbstractLinkage-Disequilibrium Score Regression (LDSC) is a popular framework for analyzing GWAS summary statistics that allows for estimating SNP heritability, confounding, and functional enrichment of genetic variants with different annotations. Recent work has highlighted the influence of implicit and explicit assumptions of the model on the biological interpretation of the results. In this work, we explored a formulation of LDSC that replaces the r2 measure of LD with a recently-proposed unbiased estimator of the D2 statistic. In addition to modest statistical difference across estimators, this derivation highlighted implicit and unrealistic assumptions about the relationship between allele frequency, effect size, and annotation status. We carry out a systematic comparison of alternative LDSC formulations by applying them to summary statistics from 47 GWAS traits. Our results show that commonly used models likely underestimate functional enrichment. These results highlight the importance of calibrating the LDSC model to achieve a more robust understanding of polygenic traits.


2017 ◽  
Author(s):  
Guiyan Ni ◽  
Gerhard Moser ◽  
Naomi R. Wray ◽  
S. Hong Lee ◽  

ABSTRACTGenetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e. linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on ~150,000 individuals give a higher accuracy than LDSC estimates based on ~400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-data sets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser.


2019 ◽  
Author(s):  
Kevin J Gleason ◽  
Fan Yang ◽  
Brandon L Pierce ◽  
Xin He ◽  
Lin S Chen

AbstractTo provide a comprehensive mechanistic interpretation of how known trait-associated SNPs affect complex traits, we propose a method – Primo – for integrative analysis of GWAS summary statistics with multiple sets of omics QTL summary statistics from different cellular conditions or studies. Primo examines SNPs’ association patterns to complex and omics traits. In gene regions harboring known susceptibility loci, Primo performs conditional association analysis to account for linkage disequilibrium. Primo allows for unknown study heterogeneity and sample correlations. We show two applications using Primo to examine the molecular mechanisms of known susceptibility loci and to detect and interpret pleiotropic effects.


2021 ◽  
Author(s):  
Nadezhda M Belonogova ◽  
Gulnara R Svishcheva ◽  
Anatoly V Kirichenko ◽  
Yakov A Tsepilov ◽  
Tatiana I Axenovich

Gene-based association analysis is an effective gene mapping tool. Many gene-based methods have been proposed recently. However, their power depends on the underlying genetic architecture, which is rarely known in complex traits, and so it is likely that a combination of such methods could serve as a universal approach. Several frameworks combining different gene-based methods have been developed. However, they all imply a fixed set of methods, weights and functional annotations. Moreover, most of them use individual phenotypes and genotypes as input data. Here, we introduce sumSTAAR, a framework for gene-based association analysis using summary statistics obtained from genome-wide association studies (GWAS). It is an extended and modified version of STAAR framework proposed by Li and colleagues in 2020. The sumSTAAR framework offers a wider range of gene-based methods to combine. It allows the user to arbitrarily define a set of these methods, weighting functions and probabilities of genetic variants being causal. The methods used in the framework were adapted to analyse genes with large number of SNPs to decrease the running time. The framework includes the polygene pruning procedure to guard against the influence of the strong GWAS signals outside the gene. We also present new improved matrices of correlations between the genotypes of variants within genes. These matrices estimated on a sample of 265,000 individuals are a state-of-the-art replacement of widely used matrices based on the 1000 Genomes Project data.


2017 ◽  
Author(s):  
Jian Zeng ◽  
Ronald de Vlaming ◽  
Yang Wu ◽  
Matthew R Robinson ◽  
Luke Lloyd-Jones ◽  
...  

AbstractEstimation of the joint distribution of effect size and minor allele frequency (MAF) for genetic variants is important for understanding the genetic basis of complex trait variation and can be used to detect signature of natural selection. We develop a Bayesian mixed linear model that simultaneously estimates SNP-based heritability, polygenicity (i.e. the proportion of SNPs with nonzero effects) and the relationship between effect size and MAF for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752), and show that on average across 28 traits, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (p < 0.05/28 =1.8×10−3) signatures of natural selection for 23 out of 28 traits including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. We further apply the method to 27,869 gene expression traits (N = 1,748), and identify 30 genes that show significant (p < 2.3×10−6) evidence of natural selection. All the significant estimates of the relationship between effect size and MAF in either complex traits or gene expression traits are consistent with a model of negative selection, as confirmed by forward simulation. We conclude that natural selection acts pervasively on human complex traits shaping genetic variation in the form of negative selection.


Sign in / Sign up

Export Citation Format

Share Document