scholarly journals An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced TF binding

2019 ◽  
Author(s):  
Divyanshi Srivastava ◽  
Begüm Aydin ◽  
Esteban O. Mazzoni ◽  
Shaun Mahony

AbstractTranscription factor (TF) binding specificity is determined via a complex interplay between the TF’s DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with TF binding in a given cell type have been well characterized. For instance, the binding sites for a majority of TFs display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the TF itself, and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of TF binding specificity, we therefore need to examine how newly activated TFs interact with sequence and preexisting chromatin landscapes.Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of TFs that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced TFs. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some TFs substantially, but not others. Furthermore, by analyzing site-level predictors, we show that TF binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Divyanshi Srivastava ◽  
Begüm Aydin ◽  
Esteban O. Mazzoni ◽  
Shaun Mahony

Abstract Background Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor’s DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. Results Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. Conclusions Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.


2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


Blood ◽  
2021 ◽  
Author(s):  
Bon Q Trinh ◽  
Simone Ummarino ◽  
Yanzhou Zhang ◽  
Alexander K Ebralidze ◽  
Mahmoud A Bassal ◽  
...  

The mechanism underlying cell type-specific gene induction conferred by ubiquitous transcription factors as well as disruptions caused by their chimeric derivatives in leukemia is not well understood. Here we investigate whether RNAs coordinate with transcription factors to drive myeloid gene transcription. In an integrated genome-wide approach surveying for gene loci exhibiting concurrent RNA- and DNA-interactions with the broadly expressed transcription factor RUNX1, we identified the long noncoding RNA LOUP. This myeloid-specific and polyadenylated lncRNA induces myeloid differentiation and inhibits cell growth, acting as a transcriptional inducer of the myeloid master regulator PU.1. Mechanistically, LOUP recruits RUNX1 to both the PU.1 enhancer and the promoter, leading to the formation of an active chromatin loop. In t(8;21) acute myeloid leukemia, wherein RUNX1 is fused to ETO, the resulting oncogenic fusion protein RUNX1-ETO limits chromatin accessibility at the LOUP locus, causing inhibition of LOUP and PU.1 expression. These findings highlight the important role of the interplay between cell type-specific RNAs and transcription factors as well as their oncogenic derivatives in modulating lineage-gene activation and raise the possibility that RNA regulators of transcription factors represent alternative targets for therapeutic development.


2021 ◽  
Author(s):  
David A Gallegos ◽  
Melyssa Minto ◽  
Fang Liu ◽  
Mariah F Hazlett ◽  
S Aryana Yousefzadeh ◽  
...  

Parvalbumin-expressing (PV+) interneurons of the nucleus accumbens (NAc) play an essential role in the addictive-like behaviors induced by psychostimulant exposure. To identify molecular mechanisms of PV+ neuron plasticity, we isolated interneuron nuclei from the NAc of male and female mice following acute or repeated exposure to amphetamine (AMPH) and sequenced for cell type-specific RNA expression and chromatin accessibility. AMPH regulated the transcription of hundreds of genes in PV+ interneurons, and this program was largely distinct from that regulated in other NAc GABAergic neurons. Chromatin accessibility at enhancers predicted cell-type specific gene regulation, identifying transcriptional mechanisms of differential AMPH responses. Finally, we observed dysregulation of multiple PV-specific, AMPH-regulated genes in an Mecp2 mutant mouse strain that shows heightened behavioral sensitivity to psychostimulants, suggesting the functional importance of this transcriptional program. Together these data provide novel insight into the cell-type specific programs of transcriptional plasticity in NAc neurons that underlie addictive-like behaviors.


2021 ◽  
Vol 22 (9) ◽  
pp. 4959
Author(s):  
Lilas Courtot ◽  
Elodie Bournique ◽  
Chrystelle Maric ◽  
Laure Guitton-Sert ◽  
Miguel Madrid-Mencía ◽  
...  

DNA replication timing (RT), reflecting the temporal order of origin activation, is known as a robust and conserved cell-type specific process. Upon low replication stress, the slowing of replication forks induces well-documented RT delays associated to genetic instability, but it can also generate RT advances that are still uncharacterized. In order to characterize these advanced initiation events, we monitored the whole genome RT from six independent human cell lines treated with low doses of aphidicolin. We report that RT advances are cell-type-specific and involve large heterochromatin domains. Importantly, we found that some major late to early RT advances can be inherited by the unstressed next-cellular generation, which is a unique process that correlates with enhanced chromatin accessibility, as well as modified replication origin landscape and gene expression in daughter cells. Collectively, this work highlights how low replication stress may impact cellular identity by RT advances events at a subset of chromosomal domains.


2015 ◽  
Author(s):  
Hilary Kiyo Finucane ◽  
Brendan Bulik-Sullivan ◽  
Alexander Gusev ◽  
Gosia Trynka ◽  
Yakir Reshef ◽  
...  

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.


2017 ◽  
Vol 3 (2) ◽  
pp. 54
Author(s):  
Uwe Benary ◽  
Elmar Wolf ◽  
Jana Wolf

The human MYC proto-oncogene protein (MYC) is a transcription factor that plays a major role in the regulation of cell proliferation. Deregulation of MYC expression is often found in cancer. In the last years, several hypotheses have been proposed to explain cell type specific MYC target gene expression patterns despite genome wide DNA binding of MYC. In a recent publication, a mathematical modelling approach in combination with experimental data demonstrated that differences in MYC-DNA-binding affinity are sufficient to explain distinct promoter occupancies and allow stratification of distinct MYC-regulated biological processes at different MYC concentrations. Here, we extend the analysis of the published mathematical model of DNA-binding behaviour of MYC to demonstrate that the insights gained in the investigation of the human osteosarcoma cell line U2OS can be generalized to other human cell types.


2019 ◽  
Author(s):  
Hyeon-Jin Kim ◽  
Galip Gürkan Yardımcı ◽  
Giancarlo Bonora ◽  
Vijay Ramani ◽  
Jie Liu ◽  
...  

AbstractSingle-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate twelve different single-cell combinatorial indexed Hi-C (sciHi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 25,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sciHi-C data in the form of “chromatin topics.” We further show enrichment of particular compartment structures associated with locus pairs in these topics.


2019 ◽  
Author(s):  
Igor Mačinković ◽  
Ina Theofel ◽  
Tim Hundertmark ◽  
Kristina Kovač ◽  
Stephan Awe ◽  
...  

Abstract CoREST has been identified as a subunit of several protein complexes that generate transcriptionally repressive chromatin structures during development. However, a comprehensive analysis of the CoREST interactome has not been carried out. We use proteomic approaches to define the interactomes of two dCoREST isoforms, dCoREST-L and dCoREST-M, in Drosophila. We identify three distinct histone deacetylase complexes built around a common dCoREST/dRPD3 core: A dLSD1/dCoREST complex, the LINT complex and a dG9a/dCoREST complex. The latter two complexes can incorporate both dCoREST isoforms. By contrast, the dLSD1/dCoREST complex exclusively assembles with the dCoREST-L isoform. Genome-wide studies show that the three dCoREST complexes associate with chromatin predominantly at promoters. Transcriptome analyses in S2 cells and testes reveal that different cell lineages utilize distinct dCoREST complexes to maintain cell-type-specific gene expression programmes: In macrophage-like S2 cells, LINT represses germ line-related genes whereas other dCoREST complexes are largely dispensable. By contrast, in testes, the dLSD1/dCoREST complex prevents transcription of germ line-inappropriate genes and is essential for spermatogenesis and fertility, whereas depletion of other dCoREST complexes has no effect. Our study uncovers three distinct dCoREST complexes that function in a lineage-restricted fashion to repress specific sets of genes thereby maintaining cell-type-specific gene expression programmes.


Author(s):  
Xiangyu Luo ◽  
Joel Schwartz ◽  
Andrea Baccarelli ◽  
Zhonghua Liu

Abstract Epigenome-wide mediation analysis aims to identify DNA methylation CpG sites that mediate the causal effects of genetic/environmental exposures on health outcomes. However, DNA methylations in the peripheral blood tissues are usually measured at the bulk level based on a heterogeneous population of white blood cells. Using the bulk level DNA methylation data in mediation analysis might cause confounding bias and reduce study power. Therefore, it is crucial to get fine-grained results by detecting mediation CpG sites in a cell-type-specific way. However, there is a lack of methods and software to achieve this goal. We propose a novel method (Mediation In a Cell-type-Specific fashion, MICS) to identify cell-type-specific mediation effects in genome-wide epigenetic studies using only the bulk-level DNA methylation data. MICS follows the standard mediation analysis paradigm and consists of three key steps. In step1, we assess the exposure-mediator association for each cell type; in step 2, we assess the mediator-outcome association for each cell type; in step 3, we combine the cell-type-specific exposure-mediator and mediator-outcome associations using a multiple testing procedure named MultiMed [Sampson JN, Boca SM, Moore SC, et al. FWER and FDR control when testing multiple mediators. Bioinformatics 2018;34:2418–24] to identify significant CpGs with cell-type-specific mediation effects. We conduct simulation studies to demonstrate that our method has correct FDR control. We also apply the MICS procedure to the Normative Aging Study and identify nine DNA methylation CpG sites in the lymphocytes that might mediate the effect of cigarette smoking on the lung function.


Sign in / Sign up

Export Citation Format

Share Document