scholarly journals Widespread signatures of natural selection across human complex traits and functional genomic categories

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jian Zeng ◽  
Angli Xue ◽  
Longda Jiang ◽  
Luke R. Lloyd-Jones ◽  
Yang Wu ◽  
...  

AbstractUnderstanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes.

2019 ◽  
Author(s):  
Jian Zeng ◽  
Angli Xue ◽  
Longda Jiang ◽  
Luke R Lloyd-Jones ◽  
Yang Wu ◽  
...  

AbstractUnderstanding how natural selection has shaped the genetic architecture of complex traits and diseases is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level data to estimate multiple features of genetic architecture, including signatures of natural selection. Here, we present an enhanced method (SBayesS) that only requires GWAS summary statistics and incorporates functional genomic annotations. We analysed GWAS data with large sample sizes for 155 complex traits and detected pervasive signatures of negative selection with diverse estimates of SNP-based heritability and polygenicity. Projecting these estimates onto a map of genetic architecture obtained from evolutionary simulations revealed relatively strong natural selection on genetic variants associated with cardiorespiratory and cognitive traits and relatively small number of mutational targets for diseases. Averaging across traits, the joint distribution of SNP effect size and MAF varied across functional genomic regions (likely to be a consequence of natural selection), with enrichment in both the number of associated variants and the magnitude of effect sizes in regions such as transcriptional start sites, coding regions and 5’- and 3’-UTRs.


Heredity ◽  
2019 ◽  
Vol 123 (6) ◽  
pp. 746-758 ◽  
Author(s):  
Juliane Friedrich ◽  
Erling Strandberg ◽  
Per Arvelius ◽  
E. Sánchez-Molano ◽  
Ricardo Pong-Wong ◽  
...  

Abstract A favourable genetic structure and diversity of behavioural features highlights the potential of dogs for studying the genetic architecture of behaviour traits. However, behaviours are complex traits, which have been shown to be influenced by numerous genetic and non-genetic factors, complicating their analysis. In this study, the genetic contribution to behaviour variation in German Shepherd dogs (GSDs) was analysed using genomic approaches. GSDs were phenotyped for behaviour traits using the established Canine Behavioural Assessment and Research Questionnaire (C-BARQ). Genome-wide association study (GWAS) and regional heritability mapping (RHM) approaches were employed to identify associations between behaviour traits and genetic variants, while accounting for relevant non-genetic factors. By combining these complementary methods we endeavoured to increase the power to detect loci with small effects. Several behavioural traits exhibited moderate heritabilities, with the highest identified for Human-directed playfulness, a trait characterised by positive interactions with humans. We identified several genomic regions associated with one or more of the analysed behaviour traits. Some candidate genes located in these regions were previously linked to behavioural disorders in humans, suggesting a new context for their influence on behaviour characteristics. Overall, the results support dogs as a valuable resource to dissect the genetic architecture of behaviour traits and also highlight the value of focusing on a single breed in order to control for background genetic effects and thus avoid limitations of between-breed analyses.


2021 ◽  
Author(s):  
Wenmin Zhang ◽  
Hamed S Najafabadi ◽  
Yue Li

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.


Author(s):  
Ruth Johnson ◽  
Kathryn S. Burch ◽  
Kangcheng Hou ◽  
Mario Paciuc ◽  
Bogdan Pasaniuc ◽  
...  

AbstractA key question in human genetics is understanding the proportion of SNPs modulating a particular phenotype or the proportion of susceptibility SNPs for a disease, termed polygenicity. Previous studies have observed that complex traits tend to be highly polygenic, opposing the previous belief that only a handful of SNPs contribute to a trait. Beyond these genome-wide estimates, the distribution of polygenicity across genomic regions as well as the genomic factors that affect regional polygenicity remain poorly understood. A reason for this gap is that methods for estimating polygenicity utilize SNP effect sizes from GWAS. However, estimating regional polygenicity from GWAS effect sizes involves untangling the correlation between SNPs due to LD, leading to intractable computations for even a small number of SNPs. In this work, we propose a scalable method, BEAVR, to estimate the regional polygenicity of a trait given marginal effect sizes from GWAS and LD information. We implement a Gibbs sampler to estimate the posterior distribution of the regional polygenicity and derive a fast, algorithmic update to circumvent the computational bottlenecks associated with LD. The runtime of our algorithm is 𝒪(MK) for M SNPs and K susceptibility SNPs, where the number of susceptibility SNPs is typically K ≪ M. By modeling the full LD structure, we show that BEAVR provides unbiased estimates of polygenicity compared to previous methods that only partially model LD. Finally, we show how estimates of regional polygenicity for BMI, eczema, and high cholesterol provide insight into the regional genetic architecture of each trait.


2021 ◽  
Vol 288 (1956) ◽  
pp. 20210693
Author(s):  
Suzanne E. McGaugh ◽  
Aaron J. Lorenz ◽  
Lex E. Flagel

Variation in complex traits is the result of contributions from many loci of small effect. Based on this principle, genomic prediction methods are used to make predictions of breeding value for an individual using genome-wide molecular markers. In breeding, genomic prediction models have been used in plant and animal breeding for almost two decades to increase rates of genetic improvement and reduce the length of artificial selection experiments. However, evolutionary genomics studies have been slow to incorporate this technique to select individuals for breeding in a conservation context or to learn more about the genetic architecture of traits, the genetic value of missing individuals or microevolution of breeding values. Here, we outline the utility of genomic prediction and provide an overview of the methodology. We highlight opportunities to apply genomic prediction in evolutionary genetics of wild populations and the best practices when using these methods on field-collected phenotypes.


2021 ◽  
Vol 17 (10) ◽  
pp. e1009483
Author(s):  
Ruth Johnson ◽  
Kathryn S. Burch ◽  
Kangcheng Hou ◽  
Mario Paciuc ◽  
Bogdan Pasaniuc ◽  
...  

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (7) ◽  
pp. e1009697
Author(s):  
Geyu Zhou ◽  
Hongyu Zhao

Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.


2020 ◽  
Author(s):  
Geyu Zhou ◽  
Hongyu Zhao

AbstractGenetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on specific parametric assumptions about genetic architecture and validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.


2018 ◽  
Author(s):  
Farhad Hormozdiari ◽  
Bryce van de Geijn ◽  
Joseph Nasser ◽  
Omer Weissbrod ◽  
Steven Gazal ◽  
...  

AbstractTransposable elements (TE) comprise roughly half of the human genome. Though initially derided as “junk DNA”, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases and complex traits remains unknown. Here, we analyze data from 41 independent diseases and complex traits (average N=320K) to draw three main conclusions. First, TE are uniquely informative for disease heritability. Despite overall depletion for heritability (54% of SNPs, 39±2% of heritability; enrichment of 0.72±0.03; 0.38-1.23 enrichment across four main TE classes), TE explain substantially more heritability than expected based on their depletion for known functional annotations (expected enrichment of 0.35±0.03; 2.11x ratio of true vs. expected enrichment). This implies that TE acquire function in ways that differ from known functional annotations. Second, older TE contribute more to disease heritability, consistent with acquiring biological function; SNPs inside the oldest 20% of TE explain 2.45x more heritability than SNPs inside the youngest 20% of TE. Third, Short Interspersed Nuclear Elements (SINE; one of the four main TE classes) are far more enriched for blood traits (2.05±0.30) than for other traits (0.96±0.09); this difference is far greater than expected based on the weaker depletion of SINEs for regulatory annotations in blood compared to other tissues. Our results elucidate the biological roles that TE play in the genetic architecture of diseases and complex traits.


2019 ◽  
Vol 36 (8) ◽  
pp. 2506-2514 ◽  
Author(s):  
Jingsi Ming ◽  
Tao Wang ◽  
Can Yang

Abstract Motivation Much effort has been made toward understanding the genetic architecture of complex traits and diseases. In the past decade, fruitful GWAS findings have highlighted the important role of regulatory variants and pervasive pleiotropy. Because of the accumulation of GWAS data on a wide range of phenotypes and high-quality functional annotations in different cell types, it is timely to develop a statistical framework to explore the genetic architecture of human complex traits by integrating rich data resources. Results In this study, we propose a unified statistical approach, aiming to characterize relationship among complex traits, and prioritize risk variants by leveraging regulatory information collected in functional annotations. Specifically, we consider a latent probit model (LPM) to integrate summary-level GWAS data and functional annotations. The developed computational framework not only makes LPM scalable to hundreds of annotations and phenotypes but also ensures its statistically guaranteed accuracy. Through comprehensive simulation studies, we evaluated LPM’s performance and compared it with related methods. Then, we applied it to analyze 44 GWASs with 9 genic category annotations and 127 cell-type specific functional annotations. The results demonstrate the benefits of LPM and gain insights of genetic architecture of complex traits. Availability and implementation The LPM package, all simulation codes and real datasets in this study are available at https://github.com/mingjingsi/LPM. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document