scholarly journals Partitioning heritability by functional category using GWAS summary statistics

2015 ◽  
Author(s):  
Hilary Kiyo Finucane ◽  
Brendan Bulik-Sullivan ◽  
Alexander Gusev ◽  
Gosia Trynka ◽  
Yakir Reshef ◽  
...  

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.


2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.



2021 ◽  
pp. gr.275723.121
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Henry E Pratt ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated, nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association studies (GWAS) catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.



2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Lin Jiang ◽  
Chao Xue ◽  
Sheng Dai ◽  
Shangzhen Chen ◽  
Peikai Chen ◽  
...  

Abstract The driver tissues or cell types in which susceptibility genes initiate diseases remain elusive. We develop a unified framework to detect the causal tissues of complex diseases or traits according to selective expression of disease-associated genes in genome-wide association studies (GWASs). This framework consists of three components which run iteratively to produce a converged prioritization list of driver tissues. Additionally, this framework also outputs a list of prioritized genes as a byproduct. We apply the framework to six representative complex diseases or traits with GWAS summary statistics, which leads to the estimation of the lung as an associated tissue of rheumatoid arthritis.



2017 ◽  
Author(s):  
Tongwu Zhang ◽  
Jiyeon Choi ◽  
Michael A. Kovacs ◽  
Jianxin Shi ◽  
Mai Xu ◽  
...  

ABSTRACTMost expression quantitative trait loci (eQTL) studies to date have been performed in heterogeneous tissues as opposed to specific cell types. To better understand the cell-type specific regulatory landscape of human melanocytes, which give rise to melanoma but account for <5% of typical human skin biopsies, we performed an eQTL analysis in primary melanocyte cultures from 106 newborn males. We identified 597,335 cis-eQTL SNPs prior to LD-pruning and 4,997 eGenes (FDR<0.05), which are higher numbers than in any GTEx tissue type with a similar sample size. Melanocyte eQTLs differed considerably from those identified in the 44 GTEx tissues, including skin. Over a third of melanocyte eGenes, including key genes in melanin synthesis pathways, were not observed to be eGenes in two types of GTEx skin tissues or TCGA melanoma samples. The melanocyte dataset also identified cell-type specific trans-eQTLs with a pigmentation-associated SNP for four genes, likely through its cis-regulation of IRF4, encoding a transcription factor implicated in human pigmentation phenotypes. Melanocyte eQTLs are enriched in cis-regulatory signatures found in melanocytes as well as melanoma-associated variants identified through genome-wide association studies (GWAS). Co-localization of melanoma GWAS variants and eQTLs from melanocyte and skin eQTL datasets identified candidate melanoma susceptibility genes for six known GWAS loci including unique genes identified by the melanocyte dataset. Further, a transcriptome-wide association study using published melanoma GWAS data uncovered four new loci, where imputed expression levels of five genes (ZFP90, HEBP1, MSC, CBWD1, and RP11-383H13.1) were associated with melanoma at genome-wide significant P-values. Our data highlight the utility of lineage-specific eQTL resources for annotating GWAS findings and present a robust database for genomic research of melanoma risk and melanocyte biology.



Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.



2015 ◽  
Author(s):  
Dominic Holland ◽  
Yunpeng Wang ◽  
Wesley K Thompson ◽  
Andrew Schork ◽  
Chi-Hua Chen ◽  
...  

Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities that does not require raw genotype data, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype -- the proportion of SNPs (after uniform pruning, so that large LD blocks are not over-represented) likely to be in strong LD with causal/mechanistically associated SNPs -- and predicting the proportion of chip heritability explainable by genome wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and additionally, for purposes of illustration, putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We estimate the degree to which effect sizes are over-estimated when based on linear regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.



2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.



Author(s):  
Ian J. Deary

‘What are the contributions of environments and genes to intelligence differences?’ asks whether genetic inheritance and the environments people experience affect intelligence differences. Researchers use two main resources to answer this question: twins and samples of DNA. Studies of identical and non-identical twins are used to show the contributions of genes, shared environment, and non-shared environment to people’s differences in traits. Twin studies tell us that by adulthood, about two-thirds of intelligence differences are caused by how people vary in their genetic inheritance, and that both shared and non-shared environments make significant contributions to intelligence differences. The introduction of genome-wide association studies in 2011 has provided a new method of estimating the heritability of intelligence.



Sign in / Sign up

Export Citation Format

Share Document