Integration of high-resolution promoter profiling assays reveals novel, cell type-specific transcription start sites across 115 human cell and tissue types

2021 ◽  
pp. gr.275723.121
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Henry E Pratt ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated, nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association studies (GWAS) catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.

2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.


2021 ◽  
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Fairlie Reese ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI GWAS catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


2016 ◽  
Vol 45 (D1) ◽  
pp. D896-D901 ◽  
Author(s):  
Jacqueline MacArthur ◽  
Emily Bowler ◽  
Maria Cerezo ◽  
Laurent Gil ◽  
Peggy Hall ◽  
...  

2015 ◽  
Author(s):  
Hilary Kiyo Finucane ◽  
Brendan Bulik-Sullivan ◽  
Alexander Gusev ◽  
Gosia Trynka ◽  
Yakir Reshef ◽  
...  

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.


2019 ◽  
Author(s):  
K.A.B. Gawronski ◽  
W. Bone ◽  
Y. Park ◽  
E. Pashos ◽  
X. Wang ◽  
...  

AbstractBackgroundGenome-wide association studies have identified 150+ loci associated with lipid levels. However, the genetic mechanisms underlying most of these loci are not well-understood. Recent work indicates that changes in the abundance of alternatively spliced transcripts contributes to complex trait variation. Consequently, identifying genetic loci that associate with alternative splicing in disease-relevant cell types and determining the degree to which these loci are informative for lipid biology is of broad interest.Methods and ResultsWe analyze gene splicing in 83 sample-matched induced pluripotent stem cell (iPSC) and hepatocyte-like cell (HLC) lines (n=166), as well as in an independent collection of primary liver tissues (n=96). We observe that transcript splicing is highly cell-type specific, and the genes that are differentially spliced between iPSCs and HLCs are enriched for metabolism pathway annotations. We identify 1,381 HLC splicing quantitative trait loci (sQTLs) and 1,462 iPSC sQTLs and find that sQTLs are often shared across cell types. To evaluate the contribution of sQTLs to variation in lipid levels, we conduct colocalization analysis using lipid genome-wide association data. We identify 19 lipid-associated loci that colocalize either with an HLC expression quantitative trait locus (eQTL) or sQTL. Only one locus colocalizes with both an sQTL and eQTL, indicating that sQTLs contribute information about GWAS loci that cannot be obtained by analysis of steady-state gene expression alone.ConclusionsThese results provide an important foundation for future efforts that use iPSC and iPSC-derived cells to evaluate genetic mechanisms influencing both cardiovascular disease risk and complex traits in general.


2021 ◽  
Author(s):  
Derek W Linskey ◽  
David C Linskey ◽  
Howard L McLeod ◽  
Jasmine A Luzum

The primary research approach in pharmacogenetics has been candidate gene association studies (CGAS), but pharmacogenomic genome-wide association studies (GWAS) are becoming more common. We are now at a critical juncture when the results of those two research approaches, CGAS and GWAS, can be compared in pharmacogenetics. We analyzed publicly available databases of pharmacogenetic CGAS and GWAS (i.e., the Pharmacogenomics Knowledgebase [PharmGKB®] and the NHGRI-EBI GWAS catalog) and the vast majority of variants (98%) and genes (94%) discovered in pharmacogenomic GWAS were novel (i.e., not previously studied CGAS). Therefore, pharmacogenetic researchers are not selecting the right candidate genes in the vast majority of CGAS, highlighting a need to shift pharmacogenetic research efforts from CGAS to GWAS.


2017 ◽  
Author(s):  
Tongwu Zhang ◽  
Jiyeon Choi ◽  
Michael A. Kovacs ◽  
Jianxin Shi ◽  
Mai Xu ◽  
...  

ABSTRACTMost expression quantitative trait loci (eQTL) studies to date have been performed in heterogeneous tissues as opposed to specific cell types. To better understand the cell-type specific regulatory landscape of human melanocytes, which give rise to melanoma but account for <5% of typical human skin biopsies, we performed an eQTL analysis in primary melanocyte cultures from 106 newborn males. We identified 597,335 cis-eQTL SNPs prior to LD-pruning and 4,997 eGenes (FDR<0.05), which are higher numbers than in any GTEx tissue type with a similar sample size. Melanocyte eQTLs differed considerably from those identified in the 44 GTEx tissues, including skin. Over a third of melanocyte eGenes, including key genes in melanin synthesis pathways, were not observed to be eGenes in two types of GTEx skin tissues or TCGA melanoma samples. The melanocyte dataset also identified cell-type specific trans-eQTLs with a pigmentation-associated SNP for four genes, likely through its cis-regulation of IRF4, encoding a transcription factor implicated in human pigmentation phenotypes. Melanocyte eQTLs are enriched in cis-regulatory signatures found in melanocytes as well as melanoma-associated variants identified through genome-wide association studies (GWAS). Co-localization of melanoma GWAS variants and eQTLs from melanocyte and skin eQTL datasets identified candidate melanoma susceptibility genes for six known GWAS loci including unique genes identified by the melanocyte dataset. Further, a transcriptome-wide association study using published melanoma GWAS data uncovered four new loci, where imputed expression levels of five genes (ZFP90, HEBP1, MSC, CBWD1, and RP11-383H13.1) were associated with melanoma at genome-wide significant P-values. Our data highlight the utility of lineage-specific eQTL resources for annotating GWAS findings and present a robust database for genomic research of melanoma risk and melanocyte biology.


2018 ◽  
Vol 47 (D1) ◽  
pp. D1005-D1012 ◽  
Author(s):  
Annalisa Buniello ◽  
Jacqueline A L MacArthur ◽  
Maria Cerezo ◽  
Laura W Harris ◽  
James Hayhurst ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document