scholarly journals Probing transcription factor combinatorics in different promoter classes and in enhancers

2017 ◽  
Author(s):  
Jimmy Vandel ◽  
Océane Cassan ◽  
Sophie Lèbre ◽  
Charles-Henri Lecellier ◽  
Laurent Bréhélin

In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combina-tions among different gene classes, regulatory regions and cell types. We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a tar-get TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combina-tions involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for pre-dicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in pro-moters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and en-hancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the dif-ferent targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately dis-tinguish promoters associated with specific biological processes.


2015 ◽  
Author(s):  
Yuchun Guo ◽  
David K. Gifford

The combinatorial binding of trans-acting factors (TFs) to regulatory genomic regions is an important basis for the spatial and temporal specificity of gene regulation. We present a new computational approach that reveals how TFs are organized into combinatorial regulatory programs. We define a regulatory program to be a set of TFs that bind together at a regulatory region. Unlike other approaches to characterizing TF binding, we permit a regulatory region to be bound by one or more regulatory programs. We have developed a method called regulatory program discovery (RPD) that produces compact and coherent regulatory programs from in vivo binding data using a topic model. Using RPD we find that the binding of 115 TFs in K562 cells can be organized into 49 interpretable regulatory programs that bind ~140,000 distinct regulatory regions in a modular manner. The discovered regulatory programs recapitulate many published protein-protein physical interactions and have consistent functional annotations of chromatin states. We found that, for certain TFs, direct (motif present) and indirect (motif absent) binding is characterized by distinct sets of binding partners and that the binding of other TFs can predict whether the TF binds directly or indirectly with high accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory programs and that thousands of regulatory regions use different programs in different cell types. Overall, our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of binding programs in regulatory regions.



2019 ◽  
Author(s):  
Tianshun Gao ◽  
Jiang Qian

AbstractLong-range regulation by distal enhancers is crucial for many biological processes. The existing methods for enhancer-target gene prediction often require many genomic features. This makes them difficult to be applied to many cell types, in which the relevant datasets are not always available. Here, we design a tool EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions. Unlike existing tools, EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets. Cross-validation revealed that EAGLE outperformed other existing methods. Enrichment analyses on special transcriptional factors, epigenetic modifications, and eQTLs demonstrated that EAGLE could distinguish the interacting pairs from non- interacting ones. Finally, EAGLE was applied to mouse and human genomes and identified 7,680,203 and 7,437,255 EG interactions involving 31,375 and 43,724 genes, 138,547 and 177,062 enhancers across 89 and 110 tissue/cell types in mouse and human, respectively. The obtained interactions are accessible through an interactive database enhanceratlas.org. The EAGLE method is available at https://github.com/EvansGao/EAGLE and the predicted datasets are available in http://www.enhanceratlas.org/.Author summaryEnhancers are DNA sequences that interact with promoters and activate target genes. Since enhancers often located far from the target genes and the nearest genes are not always the targets of the enhancers, the prediction of enhancer-target gene relationships is a big challenge. Although a few computational tools are designed for the prediction of enhancer-target genes, it’s difficult to apply them in most tissue/cell types due to a lack of enough genomic datasets. Here we proposed a new method, EAGLE, which utilizes a small number of genomic features to predict tissue/cell type-specific enhancer-gene interactions. Comparing with other existing tools, EAGLE displayed a better performance in the 10-fold cross-validation and cross-sample test. Moreover, the predictions by EAGLE were validated by other independent evidence such as the enrichment of relevant transcriptional factors, epigenetic modifications, and eQTLs.Finally, we integrated the enhancer-target relationships obtained from human and mouse genomes into an interactive database EnhancerAtlas, http://www.enhanceratlas.org/.



2019 ◽  
Author(s):  
Tom Aharon Hait ◽  
Ran Elkon ◽  
Ron Shamir

AbstractSpatiotemporal gene expression patterns are governed to a large extent by enhancer elements, typically located distally from their target genes. Identification of enhancer-promoter (EP) links that are specific and functional in individual cell types is a key challenge in understanding gene regulation. We introduce CT-FOCS, a new statistical inference method that utilizes multiple replicates per cell type to infer cell type-specific EP links. Computationally predicted EP links are usually benchmarked against experimentally determined chromatin interactions measured by ChIA-PET and promoter-capture HiC techniques. We expand this validation scheme by using also loops that overlap in their anchor sites. In analyzing 1,366 samples from ENCODE, Roadmap epigenomics and FANTOM5, CT-FOCS inferred highly cell type-specific EP links more accurately than state-of-the-art methods. We illustrate how our inferred EP links drive cell type-specific gene expression and regulation.



2018 ◽  
Author(s):  
Aziz Khan ◽  
Anthony Mathelier ◽  
Xuegong Zhang

AbstractBackgroundSuper-enhancers and stretch enhancers represent classes of transcriptional enhancers that have been shown to control the expression of cell identity genes and carry disease- and trait-associated variants. Specifically, super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors (TFs), chromatin regulators, or chromatin marks, while stretch enhancers are large chromatin-defined regulatory regions of at least 3,000 base pairs. Several studies have characterized these regulatory regions in numerous cell types and tissues to decipher their functional importance. However, the differences and similarities between these regulatory regions have not been fully assessed.ResultsWe integrated genomic, epigenomic, and transcriptomic data from ten human cell types to perform a comparative analysis of super and stretch enhancers with respect to their chromatin profiles, cell-type-specificity, and ability to control gene expression. We found that stretch enhancers are more abundant, more distal to transcription start sites, cover twice as much the genome and are significantly less conserved than super-enhancers. In contrast, super-enhancers are significantly more enriched for active chromatin marks and cohesin complex and transcriptionally active than stretch enhancers. Importantly, a vast majority of superenhancers (85%) overlap with only a small subset of stretch enhancers (13%), which are enriched for cell-type-specific biological functions, and control cell identity genes.ConclusionsThese results suggest that super-enhancers are transcriptionally more active and cell-type-specific than stretch enhancers, and importantly, most of the stretch enhancers that are distinct from superenhancers do not show an association with cell identity genes, are less active, and more likely to be poised enhancers.



2019 ◽  
Vol 217 (1) ◽  
Author(s):  
Hiroyuki Hosokawa ◽  
Maile Romero-Wolf ◽  
Qi Yang ◽  
Yasutaka Motomura ◽  
Ditsa Levanon ◽  
...  

The zinc finger transcription factor, Bcl11b, is expressed in T cells and group 2 innate lymphoid cells (ILC2s) among hematopoietic cells. In early T-lineage cells, Bcl11b directly binds and represses the gene encoding the E protein antagonist, Id2, preventing pro-T cells from adopting innate-like fates. In contrast, ILC2s co-express both Bcl11b and Id2. To address this contradiction, we have directly compared Bcl11b action mechanisms in pro-T cells and ILC2s. We found that Bcl11b binding to regions across the genome shows distinct cell type–specific motif preferences. Bcl11b occupies functionally different sites in lineage-specific patterns and controls totally different sets of target genes in these cell types. In addition, Bcl11b bears cell type–specific post-translational modifications and organizes different cell type–specific protein complexes. However, both cell types use the same distal enhancer region to control timing of Bcl11b activation. Therefore, although pro-T cells and ILC2s both need Bcl11b for optimal development and function, Bcl11b works substantially differently in these two cell types.



2021 ◽  
Author(s):  
Meghana Kshirsagar ◽  
Han Yuan ◽  
Juan Lavista Ferres ◽  
Christina Leslie

AbstractDetermining the cell type-specific and genome-wide binding locations of transcription factors (TFs) is an important step towards decoding gene regulatory programs. Profiling by the assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals open chromatin sites that are potential binding sites for TFs but does not identify which TFs occupy a given site. We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. Our approach automatically learns distinct groups of kmer patterns that correspond to cell type-specific in vivo binding signals. Latent factors found by BindVAE generally map to TFs that are expressed in the input cell type. BindVAE finds different TF binding sites in different cell types and can learn composite patterns for TFs involved in co-operative binding. BindVAE therefore provides a novel unsupervised approach to deconvolve the complex TF binding signals in chromatin accessible sites.



2019 ◽  
Author(s):  
Ashley G. Anderson ◽  
Ashwinikumar Kulkarni ◽  
Matthew Harper ◽  
Genevieve Konopka

AbstractThe striatum is a critical forebrain structure for integrating cognitive, sensory, and motor information from diverse brain regions into meaningful behavioral output. However, the transcriptional mechanisms that underlie striatal development and organization at single-cell resolution remain unknown. Here, we show that Foxp1, a transcription factor strongly linked to autism and intellectual disability, regulates organizational features of striatal circuitry in a cell-type-dependent fashion. Using single-cell RNA-sequencing, we examine the cellular diversity of the early postnatal striatum and find that cell-type-specific deletion ofFoxp1in striatal projection neurons alters the cellular composition and neurochemical architecture of the striatum. Importantly, using this approach, we identify the non-cell autonomous effects produced by disruptingFoxp1in one cell-type and the molecular compensation that occurs in other populations. Finally, we identify Foxp1-regulated target genes within distinct cell-types and connect these molecular changes to functional and behavioral deficits relevant to phenotypes described in patients withFOXP1loss-of-function mutations. These data reveal cell-type-specific transcriptional mechanisms underlying distinct features of striatal circuitry and identify Foxp1 as a key regulator of striatal development.



2020 ◽  
Author(s):  
Julie A Prost ◽  
Christopher JF Cameron ◽  
Mathieu Blanchette

Genomic organization is critical for proper gene regulation and based on a hierarchical model, where chromosomes are segmented into megabase-sized, cell-type-specific transcriptionally active (A) and inactive (B) compartments. Here, we describe SACSANN, a machine learning pipeline consisting of stacked artificial neural networks that predicts compartment annotation solely from genomic sequence-based features such as predicted transcription factor binding sites and transposable elements. SACSANN provides accurate and cell-type specific compartment predictions, while identifying key genomic sequence determinants that associate with A/B compartments. Models are shown to be largely transferable across analogous human and mouse cell types. By enabling the study of chromosome compartmentalization in species for which no Hi-C data is available, SACSANN paves the way toward the study of 3D genome evolution. SACSANN is publicly available on GitHub: https://github.com/BlanchetteLab/SACSANN



Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Suvi Linna Kuosmanen ◽  
Eloi Schmauch ◽  
Kyriakitsa Galani ◽  
Carles Boix ◽  
Yongjin P Park ◽  
...  

Genome-wide association studies have uncovered over 200 genetic loci underlying coronary artery disease (CAD), providing great hope for a deeper understanding of the causal mechanisms leading to this disease. However, in order to understand CAD at the molecular level, it is necessary to uncover cell-type-specific circuits and to use these circuits to dissect driver variants, genes, pathways, and cell types, in normal and diseased tissues. Here, we provide the most detailed single-cell dissection of human heart cell types, using cardiac biopsies collected during open-heart surgery from healthy, CAD, and CAD-related heart failure donors, and profiling both transcriptional (scRNA-seq) and epigenomic (scATAC-seq) changes. Using this approach, we identify 12 major heart cell types, including typical cardiovascular cells (cardiomyocytes, endothelial cells, fibroblasts), rarer cell types (B cells, neurons, Schwann cells), and previously-unrecognized layer-specific epithelial and endothelial cell types. We define markers for each cell type, providing the first extensive reference set for the living human heart. In addition, we define differential gene expression patterns in CAD relative to control samples, revealing substantial differences in cell-type-specific expression of disease-related genes, emphasizing, for example, the importance of the vascular endothelium in the pathogenesis of CAD. Strikingly, further clustering of the cell types based on specific subtypes revealed important differences in their expression patterns of disease-associated genes. These changes enrich in known CAD genetic loci, enabling us to recognize their likely target genes from scRNA-seq expression changes, candidate driver variants based on scATAC-seq localization and differential DNA accessibility, and candidate upstream regulators based on their enriched motif occurrences in scATAC loci. Overall, our results highlight the relevance and potential of single-cell transcriptional and epigenomic analyses to gain new biological insights into cardiovascular disease, and to recognize novel therapeutic target genes, pathways, and the cell types where they act.



2021 ◽  
Author(s):  
Ren Yi ◽  
Kyunghyun Cho ◽  
Richard Bonneau

Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here we propose NetTIME, a multitask learning framework for predicting cell-type-specific transcription factor binding sites with base-pair resolution. We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with several state-of-the-art methods, including DeepBind, BindSpace, and Catchitt, and show that our method outperforms previous methods under both supervised and transfer learning settings.



Sign in / Sign up

Export Citation Format

Share Document