scholarly journals Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability

2019 ◽  
Vol 29 (7) ◽  
pp. 1057-1067 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

Abstract Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10−14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10−11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.

2018 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

AbstractIt is widely known that regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation, but genomic annotations from directly measured TF binding information are not currently available for most cell-type-TF pairs. Here, we construct cell-type-specific TF binding annotations by intersecting sequence-based TF binding predictions with cell-type-specific chromatin data; this strategy addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context, and the limitation that sequence-based predictions are generally not cell-type-specific. We evaluated different combinations of sequence-based TF predictions and chromatin data by partitioning the heritability of 49 diseases and complex traits (average N=320K) using stratified LD score regression with the baseline-LD model (which is not cell-type-specific). We determined that 100bp windows around MotifMap sequenced-based TF binding predictions intersected with a union of six cell-type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6x vs 7.3x; P = 9 × 10-14 for difference) and a 12% increase in cell-type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that intersecting sequence-based TF predictions with cell-type-specific chromatin information can help refine genome-wide association signals.


2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.


2019 ◽  
Author(s):  
K.A.B. Gawronski ◽  
W. Bone ◽  
Y. Park ◽  
E. Pashos ◽  
X. Wang ◽  
...  

AbstractBackgroundGenome-wide association studies have identified 150+ loci associated with lipid levels. However, the genetic mechanisms underlying most of these loci are not well-understood. Recent work indicates that changes in the abundance of alternatively spliced transcripts contributes to complex trait variation. Consequently, identifying genetic loci that associate with alternative splicing in disease-relevant cell types and determining the degree to which these loci are informative for lipid biology is of broad interest.Methods and ResultsWe analyze gene splicing in 83 sample-matched induced pluripotent stem cell (iPSC) and hepatocyte-like cell (HLC) lines (n=166), as well as in an independent collection of primary liver tissues (n=96). We observe that transcript splicing is highly cell-type specific, and the genes that are differentially spliced between iPSCs and HLCs are enriched for metabolism pathway annotations. We identify 1,381 HLC splicing quantitative trait loci (sQTLs) and 1,462 iPSC sQTLs and find that sQTLs are often shared across cell types. To evaluate the contribution of sQTLs to variation in lipid levels, we conduct colocalization analysis using lipid genome-wide association data. We identify 19 lipid-associated loci that colocalize either with an HLC expression quantitative trait locus (eQTL) or sQTL. Only one locus colocalizes with both an sQTL and eQTL, indicating that sQTLs contribute information about GWAS loci that cannot be obtained by analysis of steady-state gene expression alone.ConclusionsThese results provide an important foundation for future efforts that use iPSC and iPSC-derived cells to evaluate genetic mechanisms influencing both cardiovascular disease risk and complex traits in general.


2021 ◽  
Author(s):  
John M Rouhana ◽  
Jiali Wang ◽  
Gokcen Eraslan ◽  
Shankara Anand ◽  
Andrew R Hamel ◽  
...  

Summary: ECLIPSER was developed to identify pathogenic cell types and cell type-specific genes that may affect complex disease susceptibility and trait variation by integrating single cell data with known GWAS loci. ECLIPSER maps genes to GWAS loci for a given complex trait based on expression and splicing quantitative trait loci (e/sQTLs) and other functional data, and tests whether the mapped genes are enriched for cell type-specific expression in particular cell types using single-cell/nucleus RNA-seq data from one or more tissues of interest. A Bayesian Fisher's exact test is used to compute fold-enrichment significance. We demonstrate the application of ECLIPSER on various skin diseases and traits using snRNA-seq of healthy human skin samples. Availability and Implementation: The python source code and documentation for ECLIPSER and a Jupyter notebook for generating output tables and figures are available at https://github.com/segrelabgenomics/ECLIPSER. The source code for GWASvar2gene that maps genes to GWAS loci based on e/sQTLs is available at https://github.com/segrelabgenomics/GWASvar2gene. The analysis presented here used data from GTEx (https://gtexportal.org/home/datasets) and Open Targets Genetics (https://genetics-docs.opentargets.org/data-access/graphql-api), but can also be applied to other GWAS variant lists and QTL studies. Data used to reproduce the results of the paper are available in Supplementary data.


2017 ◽  
Author(s):  
Jimmy Vandel ◽  
Océane Cassan ◽  
Sophie Lèbre ◽  
Charles-Henri Lecellier ◽  
Laurent Bréhélin

In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combina-tions among different gene classes, regulatory regions and cell types. We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a tar-get TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combina-tions involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for pre-dicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in pro-moters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and en-hancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the dif-ferent targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately dis-tinguish promoters associated with specific biological processes.


2019 ◽  
Author(s):  
Alexi Nott ◽  
Inge R. Holtman ◽  
Nicole G. Coufal ◽  
Johannes C.M. Schlachetzki ◽  
Miao Yu ◽  
...  

AbstractUnique cell type-specific patterns of activated enhancers can be leveraged to interpret non-coding genetic variation associated with complex traits and diseases such as neurological and psychiatric disorders. Here, we have defined active promoters and enhancers for major cell types of the human brain. Whereas psychiatric disorders were primarily associated with regulatory regions in neurons, idiopathic Alzheimer’s disease (AD) variants were largely confined to microglia enhancers. Interactome maps connecting GWAS variants in cell type-specific enhancers to gene promoters revealed an extended microglia gene network in AD. Deletion of a microglia-specific enhancer harboring AD-risk variants ablated BIN1 expression in microglia but not in neurons or astrocytes. These findings revise and expand the genes likely to be influenced by non-coding variants in AD and suggest the probable brain cell types in which they function.One Sentence SummaryIdentification of cell type-specific regulatory elements in the human brain enables interpretation of non-coding GWAS risk variants.


2021 ◽  
Author(s):  
Meghana Kshirsagar ◽  
Han Yuan ◽  
Juan Lavista Ferres ◽  
Christina Leslie

AbstractDetermining the cell type-specific and genome-wide binding locations of transcription factors (TFs) is an important step towards decoding gene regulatory programs. Profiling by the assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals open chromatin sites that are potential binding sites for TFs but does not identify which TFs occupy a given site. We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. Our approach automatically learns distinct groups of kmer patterns that correspond to cell type-specific in vivo binding signals. Latent factors found by BindVAE generally map to TFs that are expressed in the input cell type. BindVAE finds different TF binding sites in different cell types and can learn composite patterns for TFs involved in co-operative binding. BindVAE therefore provides a novel unsupervised approach to deconvolve the complex TF binding signals in chromatin accessible sites.


Blood ◽  
2020 ◽  
Author(s):  
Qian Qi ◽  
Li Cheng ◽  
Xing Tang ◽  
Yanghua He ◽  
Yichao Li ◽  
...  

While constitutive CTCF-binding sites are needed to maintain relatively invariant chromatin structures, such as topologically associating domains, the precise roles of CTCF to control cell type-specific transcriptional regulation remain poorly explored. We examined CTCF occupancy in different types of primary blood cells derived from the same donor to elucidate a new role for CTCF in gene regulation during blood cell development. We identified dynamic, cell type-specific binding sites for CTCF that colocalize with lineage-specific transcription factors. These dynamic sites are enriched for single nucleotide polymorphisms that are associated with blood cell traits in different linages, and they coincide with the key regulatory elements governing hematopoiesis. CRISPR/Cas9-based perturbation experiments demonstrated that these dynamic CTCF-binding sites play a critical role in red blood cell development. Furthermore, precise deletion of CTCF-binding motifs in dynamic sites abolished interactions of erythroid genes, such as RBM38, with their associated enhancers and led to abnormal erythropoiesis. These results suggest a novel, cell type-specific function for CTCF in which it may serve to facilitate interaction of distal regulatory emblements with target promoters. Our study of the dynamic, cell type-specific binding and function of CTCF provides new insights into transcriptional regulation during hematopoiesis.


2020 ◽  
Author(s):  
Julie A Prost ◽  
Christopher JF Cameron ◽  
Mathieu Blanchette

Genomic organization is critical for proper gene regulation and based on a hierarchical model, where chromosomes are segmented into megabase-sized, cell-type-specific transcriptionally active (A) and inactive (B) compartments. Here, we describe SACSANN, a machine learning pipeline consisting of stacked artificial neural networks that predicts compartment annotation solely from genomic sequence-based features such as predicted transcription factor binding sites and transposable elements. SACSANN provides accurate and cell-type specific compartment predictions, while identifying key genomic sequence determinants that associate with A/B compartments. Models are shown to be largely transferable across analogous human and mouse cell types. By enabling the study of chromosome compartmentalization in species for which no Hi-C data is available, SACSANN paves the way toward the study of 3D genome evolution. SACSANN is publicly available on GitHub: https://github.com/BlanchetteLab/SACSANN


2016 ◽  
Author(s):  
Nicholas E. Banovich ◽  
Yang I. Li ◽  
Anil Raj ◽  
Michelle C. Ward ◽  
Peyton Greenside ◽  
...  

AbstractInduced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation across different cell types and as models for studies of complex disease. We established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression, chromatin accessibility and DNA methylation. Regulatory variation between individuals is lower in iPSCs than in the differentiated cell types, consistent with the intuition that developmental processes are generally canalized. While most cell type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell type-specific QTLs are in shared open chromatin. Finally, we developed a deep neural network to predict open chromatin regions from DNA sequence alone and were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell type-specific chromatin accessibility.


Sign in / Sign up

Export Citation Format

Share Document