scholarly journals Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits

2021 ◽  
Vol 17 (10) ◽  
pp. e1009483
Author(s):  
Ruth Johnson ◽  
Kathryn S. Burch ◽  
Kangcheng Hou ◽  
Mario Paciuc ◽  
Bogdan Pasaniuc ◽  
...  

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.

2021 ◽  
Author(s):  
Davide Marnetto ◽  
Vasili Pankratov ◽  
Mayukh Mondal ◽  
Francesco Montinaro ◽  
Katri Pärna ◽  
...  

The contemporary European genetic makeup formed in the last 8000 years as the combination of three main genetic components: the local Western Hunter-Gatherers, the incoming Neolithic Farmers from Anatolia and the Bronze Age component from the Pontic Steppes. When meeting into the post-Neolithic European environment, the genetic variants accumulated during their three distinct evolutionary histories mixed and came into contact with new environmental challenges. Here we investigate how this genetic legacy reflects on the complex trait landscape of contemporary European populations, using the Estonian Biobank as a case study. For the first time we directly connect the phenotypic information available from biobank samples with the genetic similarity to these ancestral groups, both at a genome-wide level and focusing on genomic regions associated with each of the 27 complex traits we investigated. We also found SNPs connected to pigmentation, cholesterol, sleep, diastolic blood pressure, and body mass index (BMI) to show signals of selection following the post Neolithic admixture events. We recapitulate existing knowledge about pigmentation traits, corroborate the connection between Steppe ancestry and height and highlight novel associations. Among others, we report the contribution of Hunter Gatherer ancestry towards high BMI and low blood cholesterol levels. Our results show that the ancient components that form the contemporary European genome were differentiated enough to contribute ancestry-specific signatures to the phenotypic variability displayed by contemporary individuals in at least 11 out of 27 of the complex traits investigated here.


2021 ◽  
Author(s):  
Richard F Oppong ◽  
Pau Navarro ◽  
Chris S Haley ◽  
Sara Knott

We describe a genome-wide analytical approach, SNP and Haplotype Regional Heritability Mapping (SNHap-RHM), that provides regional estimates of the heritability across locally defined regions in the genome. This approach utilises relationship matrices that are based on sharing of SNP and haplotype alleles at local haplotype blocks delimited by recombination boundaries in the genome. We implemented the approach on simulated data and show that the haplotype-based regional GRMs capture variation that is complementary to that captured by SNP-based regional GRMs, and thus justifying the fitting of the two GRMs jointly in a single analysis (SNHap-RHM). SNHap-RHM captures regions in the genome contributing to the phenotypic variation that existing genome-wide analysis methods may fail to capture. We further demonstrate that there are real benefits to be gained from this approach by applying it to real data from about 20,000 individuals from the Generation Scotland: Scottish Family Health Study. We analysed height and major depressive disorder (MDD). We identified seven genomic regions that are genome-wide significant for height, and three regions significant at a suggestive threshold (p-value <1x10^(-5) ) for MDD. These significant regions have genes mapped to within 400kb of them. The genes mapped for height have been reported to be associated with height in humans, whiles those mapped for MDD have been reported to be associated with major depressive disorder and other psychiatry phenotypes. The results show that SNHap-RHM presents an exciting new opportunity to analyse complex traits by allowing the joint mapping of novel genomic regions tagged by either SNPs or haplotypes, potentially leading to the recovery of some of the "missing" heritability.


2019 ◽  
Author(s):  
Huwenbo Shi ◽  
Kathryn S. Burch ◽  
Ruth Johnson ◽  
Malika K. Freund ◽  
Gleb Kichaev ◽  
...  

AbstractDespite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze 9 complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8x enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWAS due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.


2021 ◽  
Author(s):  
Brian C Zhang ◽  
Arjun Biddanda ◽  
Pier Francesco Palamara

Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses. We introduce a method for accurately inferring biobank-scale genome-wide genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies within linear mixed models to perform association and other complex trait analyses. We use these new methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and to detect associations in 7 complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 133, frequency range 0.0004% - 0.1%) than genotype imputation from ~65,000 sequenced haplotypes (N = 65). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants, which are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Inferred genealogies also capture additional association signals in higher frequency variants. These results demonstrate that large-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.


Heredity ◽  
2019 ◽  
Vol 123 (6) ◽  
pp. 746-758 ◽  
Author(s):  
Juliane Friedrich ◽  
Erling Strandberg ◽  
Per Arvelius ◽  
E. Sánchez-Molano ◽  
Ricardo Pong-Wong ◽  
...  

Abstract A favourable genetic structure and diversity of behavioural features highlights the potential of dogs for studying the genetic architecture of behaviour traits. However, behaviours are complex traits, which have been shown to be influenced by numerous genetic and non-genetic factors, complicating their analysis. In this study, the genetic contribution to behaviour variation in German Shepherd dogs (GSDs) was analysed using genomic approaches. GSDs were phenotyped for behaviour traits using the established Canine Behavioural Assessment and Research Questionnaire (C-BARQ). Genome-wide association study (GWAS) and regional heritability mapping (RHM) approaches were employed to identify associations between behaviour traits and genetic variants, while accounting for relevant non-genetic factors. By combining these complementary methods we endeavoured to increase the power to detect loci with small effects. Several behavioural traits exhibited moderate heritabilities, with the highest identified for Human-directed playfulness, a trait characterised by positive interactions with humans. We identified several genomic regions associated with one or more of the analysed behaviour traits. Some candidate genes located in these regions were previously linked to behavioural disorders in humans, suggesting a new context for their influence on behaviour characteristics. Overall, the results support dogs as a valuable resource to dissect the genetic architecture of behaviour traits and also highlight the value of focusing on a single breed in order to control for background genetic effects and thus avoid limitations of between-breed analyses.


2020 ◽  
Author(s):  
Zhien Pu ◽  
Xueling Ye ◽  
Yang Li ◽  
Zehou Liu ◽  
Bingxin Shi ◽  
...  

Abstract Backgrounds: Grain protein concentration (GPC), grain starch concentration (GSC), and wet gluten concentration (WGC) are complex traits that determine nutrient concentration, end-use quality, and yield in wheat. To identify the elite and stable loci or genomic regions conferring high GPC, GSC, and WGC, a genome-wide association study (GWAS) based on a mixed linear model (MLM) was performed using 55K single nucleotide polymorphism (SNP) array in a panel of 236 wheat accessions, including 160 commercial varieties and 76 landraces, derived from Sichuan Province, China. The panel was evaluated for GPC, GSC, and WGC at four different fields. Results: Phenotypic analysis showed variation in GPC, GSC, and WGC among the different genotypes and environments. GWAS identified 12 quantitative trait loci (QTL) (-log10(P) > 2.5) associated with these three quality traits in at least two environments and located on chromosomes 1B, 1D, 2A, 2B, 2D, 3B, 3D, 5D, and 7D; the phenotypic variation explained (PVE) by these QTL ranged from 4.2% to 10.7%. Among these, three, seven, and two QTL are associated with GPC, GSC, and WGC, respectively; five QTL (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.1, QGsc.sicau-2DL.2, QWgc.sicau-5DL) were defined potentially novel Compared with the previously reported QTLs/genes by linkage or association mapping, 5 QTLs (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.1, QGsc.sicau-2DL.2, QWgc.sicau-5DL) were potentially novel. Furthermore, 21 presumptive candidate genes, which are involved in the metabolism or transportation of all kinds of carbohydrates, photosynthesis, programmed cell death, the balance of abscisic acid and ethylene, within these potentially novel genomic regions were predicted. Conclusions: This study provided new genetic resources and valuable genetic information of nutritional quality to broaden the genetic background and laid the molecular foundation for marker-assisted selection in wheat quality breeding.


2016 ◽  
Author(s):  
Andrew Anand Brown ◽  
Ana Viñuela ◽  
Olivier Delaneau ◽  
Tim Spector ◽  
Kerrin Small ◽  
...  

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.


PLoS Genetics ◽  
2020 ◽  
Vol 16 (11) ◽  
pp. e1009163
Author(s):  
Yunpeng Wang ◽  
Ron Nudel ◽  
Michael E. Benros ◽  
Kristin Skogstrand ◽  
Simon Fishilevich ◽  
...  

Circulating inflammatory markers are essential to human health and disease, and they are often dysregulated or malfunctioning in cancers as well as in cardiovascular, metabolic, immunologic and neuropsychiatric disorders. However, the genetic contribution to the physiological variation of levels of circulating inflammatory markers is largely unknown. Here we report the results of a genome-wide genetic study of blood concentration of ten cytokines, including the hitherto unexplored calcium-binding protein (S100B). The study leverages a unique sample of neonatal blood spots from 9,459 Danish subjects from the iPSYCH initiative. We estimate the SNP-heritability of marker levels as ranging from essentially zero for Erythropoietin (EPO) up to 73% for S100B. We identify and replicate 16 associated genomic regions (p < 5 x 10−9), of which four are novel. We show that the associated variants map to enhancer elements, suggesting a possible transcriptional effect of genomic variants on the cytokine levels. The identification of the genetic architecture underlying the basic levels of cytokines is likely to prompt studies investigating the relationship between cytokines and complex disease. Our results also suggest that the genetic architecture of cytokines is stable from neonatal to adult life.


2022 ◽  
Vol 12 ◽  
Author(s):  
Richard F. Oppong ◽  
Thibaud Boutin ◽  
Archie Campbell ◽  
Andrew M. McIntosh ◽  
David Porteous ◽  
...  

We describe a genome-wide analytical approach, SNP and Haplotype Regional Heritability Mapping (SNHap-RHM), that provides regional estimates of the heritability across locally defined regions in the genome. This approach utilises relationship matrices that are based on sharing of SNP and haplotype alleles at local haplotype blocks delimited by recombination boundaries in the genome. We implemented the approach on simulated data and show that the haplotype-based regional GRMs capture variation that is complementary to that captured by SNP-based regional GRMs, and thus justifying the fitting of the two GRMs jointly in a single analysis (SNHap-RHM). SNHap-RHM captures regions in the genome contributing to the phenotypic variation that existing genome-wide analysis methods may fail to capture. We further demonstrate that there are real benefits to be gained from this approach by applying it to real data from about 20,000 individuals from the Generation Scotland: Scottish Family Health Study. We analysed height and major depressive disorder (MDD). We identified seven genomic regions that are genome-wide significant for height, and three regions significant at a suggestive threshold (p-value &lt; 1 × 10−5) for MDD. These significant regions have genes mapped to within 400 kb of them. The genes mapped for height have been reported to be associated with height in humans. Similarly, those mapped for MDD have been reported to be associated with major depressive disorder and other psychiatry phenotypes. The results show that SNHap-RHM presents an exciting new opportunity to analyse complex traits by allowing the joint mapping of novel genomic regions tagged by either SNPs or haplotypes, potentially leading to the recovery of some of the “missing” heritability.


2021 ◽  
Vol 12 ◽  
Author(s):  
Claire P. Prowse-Wilkins ◽  
Jianghui Wang ◽  
Ruidong Xiang ◽  
Josie B. Garner ◽  
Michael E. Goddard ◽  
...  

Genetic variants which affect complex traits (causal variants) are thought to be found in functional regions of the genome. Identifying causal variants would be useful for predicting complex trait phenotypes in dairy cows, however, functional regions are poorly annotated in the bovine genome. Functional regions can be identified on a genome-wide scale by assaying for post-translational modifications to histone proteins (histone modifications) and proteins interacting with the genome (e.g., transcription factors) using a method called Chromatin immunoprecipitation followed by sequencing (ChIP-seq). In this study ChIP-seq was performed to find functional regions in the bovine genome by assaying for four histone modifications (H3K4Me1, H3K4Me3, H3K27ac, and H3K27Me3) and one transcription factor (CTCF) in 6 tissues (heart, kidney, liver, lung, mammary and spleen) from 2 to 3 lactating dairy cows. Eighty-six ChIP-seq samples were generated in this study, identifying millions of functional regions in the bovine genome. Combinations of histone modifications and CTCF were found using ChromHMM and annotated by comparing with active and inactive genes across the genome. Functional marks differed between tissues highlighting areas which might be particularly important to tissue-specific regulation. Supporting the cis-regulatory role of functional regions, the read counts in some ChIP peaks correlated with nearby gene expression. The functional regions identified in this study were enriched for putative causal variants as seen in other species. Interestingly, regions which correlated with gene expression were particularly enriched for potential causal variants. This supports the hypothesis that complex traits are regulated by variants that alter gene expression. This study provides one of the largest ChIP-seq annotation resources in cattle including, for the first time, in the mammary gland of lactating cows. By linking regulatory regions to expression QTL and trait QTL we demonstrate a new strategy for identifying causal variants in cattle.


Sign in / Sign up

Export Citation Format

Share Document