scholarly journals ChromWave: Deciphering the DNA-encoded competition between transcription factors and nucleosomes with deep neural networks

2021 ◽  
Author(s):  
Sera Aylin Cakiroglu ◽  
Sebastian Steinhauser ◽  
Jon Smith ◽  
Wei Xing ◽  
Nicholas M. Luscombe

SummaryTranscription factors (TFs) regulate gene expression by recognising and binding specific DNA sequences. At times, these regulatory elements may be occluded by nucleosomes, making them inaccessible for TF-binding. The competition for DNA occupancy between TFs and nucleosomes, and associated gene regulatory outputs, are important consequences of the cis-regulatory information encoded in the genome. However, these sequence patterns are subtle and remain difficult to interpret. Here, we introduce ChromWave, a deep-learning model that, for the first time, predicts the competing profiles for TF and nucleosomes occupancies with remarkable accuracy. Models trained using short- and long-fragment MNase-Seq data successfully learn the sequence preferences underlying TF and nucleosome occupancies across the entire yeast genome. They recapitulate nucleosome evictions from regions containing “strong” TF binding sites and knock-out simulations show nucleosomes gaining occupancy in the absence of these TFs, accompanied by lateral rearrangement of adjacent nucleosomes. At a local level, models anticipate with high accuracy the outcomes of detailed experimental analysis of partially unwrapped nucleosomes at the GAL4 UAS locus. Finally, we trained a ChromWave model that successfully predicts nucleosome positions at promoters in the human genome. We find that human promoters generally contain few sites at which simple sequence changes can alter nucleosome occupancies and that these positions align well with causal variants linked to DNase hypersensitivity. ChromWave is readily combined with diverse genomic datasets and can be trained to predict any output that is linked to the underlying genomic sequence. ChromWave’s application is limited only by the user’s imagination and availability of training data.

1991 ◽  
Vol 11 (3) ◽  
pp. 1488-1499 ◽  
Author(s):  
H J Roth ◽  
G C Das ◽  
J Piatigorsky

Expression of the chicken beta B1-crystallin gene was examined. Northern (RNA) blot and primer extension analyses showed that while abundant in the lens, the beta B1 mRNA is absent from the liver, brain, heart, skeletal muscle, and fibroblasts of the chicken embryo, suggesting lens specificity. Promoter fragments ranging from 434 to 126 bp of 5'-flanking sequence (plus 30 bp of exon 1) of the beta B1 gene fused to the bacterial chloramphenicol acetyltransferase gene functioned much more efficiently in transfected embryonic chicken lens epithelial cells than in transfected primary muscle fibroblasts or HeLa cells. Transient expression of recombinant plasmids in cultured lens cells, DNase I footprinting, in vitro transcription in a HeLa cell extract, and gel mobility shift assays were used to identify putative functional promoter elements of the beta B1-crystallin gene. Sequence analysis revealed a number of potential regulatory elements between positions -126 and -53 of the beta B1 promoter, including two Sp1 sites, two octamer binding sequence-like sites (OL-1 and OL-2), and two polyomavirus enhancer-like sites (PL-1 and PL-2). Deletion and site-specific mutation experiments established the functional importance of PL-1 (-116 to -102), PL-2 (-90 to -76), and OL-2 (-75 to -68). DNase I footprinting using a lens or a HeLa cell nuclear extract and gel mobility shifts using a lens nuclear extract indicated the presence of putative lens transcription factors binding to these DNA sequences. Competition experiments provided evidence that PL-1 and PL-2 recognize the same or very similar factors, while OL-2 recognizes a different factor. Our data suggest that the same or closely related transcription factors found in many tissues are used for expression of the chicken beta B1-crystallin gene in the lens.


2006 ◽  
Vol 17 (2) ◽  
pp. 585-597 ◽  
Author(s):  
Fang Liu ◽  
Nabendu Pore ◽  
Mijin Kim ◽  
K. Ranh Voong ◽  
Melissa Dowling ◽  
...  

Histone deacetylases mediate critical cellular functions but relatively little is known about mechanisms controlling their expression, including expression of HDAC4, a class II HDAC implicated in the modulation of cellular differentiation and viability. Endogenous HDAC4 mRNA, protein levels and promoter activity were all readily repressed by mithramycin, suggesting regulation by GC-rich DNA sequences. We validated consensus binding sites for Sp1/Sp3 transcription factors in the HDAC4 promoter through truncation studies and targeted mutagenesis. Specific and functional binding by Sp1/Sp3 at these sites was confirmed with chromatin immunoprecipitation (ChIP) and electromobility shift assays (EMSA). Cotransfection of either Sp1 or Sp3 with a reporter driven by the HDAC4 promoter led to high activities in SL2 insect cells (which lack endogenous Sp1/Sp3). In human cells, restored expression of Sp1 and Sp3 up-regulated HDAC4 protein levels, whereas levels were decreased by RNA-interference-mediated knockdown of either protein. Finally, variable levels of Sp1 were in concordance with that of HDAC4 in a number of human tissues and cancer cell lines. These studies together characterize for the first time the activity of the HDAC4 promoter, through which Sp1 and Sp3 modulates expression of HDAC4 and which may contribute to tissue or cell-line-specific expression of HDAC4.


Plants ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 441 ◽  
Author(s):  
Ho ◽  
Geisler

The interactions between transcription factors (TFs) and cis-acting regulatory elements (CREs) provide crucial information on the regulation of gene expression. The determination of TF-binding sites and CREs experimentally is costly and time intensive. An in silico identification and annotation of TFs, and the prediction of CREs from rice are made possible by the availability of whole genome sequence and transcriptome data. In this study, we tested the applicability of two algorithms developed for other model systems for the identification of biologically significant CREs of co-expressed genes from rice. CREs were identified from the DNA sequences located upstream from the transcription start sites, untranslated regions (UTRs), and introns, and downstream from the translational stop codons of co-expressed genes. The biologically significance of each CRE was determined by correlating their absence and presence in each gene with that gene’s expression profile using a meta-database constructed from 50 rice microarray data sets. The reliability of these methods in the predictions of CREs and their corresponding TFs was supported by previous wet lab experimental data and a literature review. New CREs corresponding to abiotic stresses, biotic stresses, specific tissues, and developmental stages were identified from rice, revealing new pieces of information for future experimental testing. The effectiveness of some—but not all—CREs was found to be affected by copy number, position, and orientation. The corresponding TFs that were most likely correlated with each CRE were also identified. These findings not only contribute to the prioritization of candidates for further analysis, the information also contributes to the understanding of the gene regulatory network.


Open Medicine ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. 640-650
Author(s):  
Maria Araceli Diaz Cruz ◽  
Dan Lund ◽  
Ferenc Szekeres ◽  
Sandra Karlsson ◽  
Maria Faresjö ◽  
...  

Abstract Nuclear receptors (NRs) are ligand-activated transcription factors that regulate gene expression when bound to specific DNA sequences. Crosstalk between steroid NR systems has been studied for understanding the development of hormone-driven cancers but not to an extent at a genetic level. This study aimed to investigate crosstalk between steroid NRs in conserved intron and exon sequences, with a focus on steroid NRs involved in prostate cancer etiology. For this purpose, we evaluated conserved intron and exon sequences among all 49 members of the NR Superfamily (NRS) and their relevance as regulatory sequences and NR-binding sequences. Sequence conservation was found to be higher in the first intron (35%), when compared with downstream introns. Seventy-nine percent of the conserved regions in the NRS contained putative transcription factor binding sites (TFBS) and a large fraction of these sequences contained splicing sites (SS). Analysis of transcription factors binding to putative intronic and exonic TFBS revealed that 5 and 16%, respectively, were NRs. The present study suggests crosstalk between steroid NRs, e.g., vitamin D, estrogen, progesterone, and retinoic acid endocrine systems, through cis-regulatory elements in conserved sequences of introns and exons. This investigation gives evidence for crosstalk between steroid hormones and contributes to novel targets for steroid NR regulation.


2018 ◽  
Author(s):  
Arya Zandvakili ◽  
Juli Uhl ◽  
Ian Campbell ◽  
Yuntao Charlie Song ◽  
Brian Gebelein

AbstractHox genes encode a family of transcription factors that, despite having similar in vitro DNA binding preferences, regulate distinct genetic programs along the metazoan anterior-posterior axis. To better define mechanisms of Hox specificity, we compared and contrasted the ability of abdominal Hox factors to regulate two cis-regulatory elements within the Drosophila embryo. Both the Ultrabithorax (Ubx) and Abdominal-A (Abd-A) Hox factors form cooperative complexes with the Extradenticle (Exd) and Homothorax (Hth) transcription factors to repress the distal-less leg selector gene via the DCRE, whereas only Abd-A interacts with Exd and Hth on the RhoA element to activate a rhomboid serine protease gene that stimulates Epidermal Growth Factor secretion. By swapping binding sites between these elements, we found that the RhoA Exd/Hth/Hox site configuration that mediates Abd-A specific activation can also convey transcriptional repression by both Ubx and Abd-A when placed into the DCRE, but only in one orientation. We further show that the orientation and spacing of Hox sites relative to additional transcription factor binding sites within the RhoA and DCRE elements is critical to mediate appropriate cell- and segment-specific output. These results indicate that the interaction between Hox, Exd, and Hth neither determines activation vs repression specificity nor defines Ubx vs Abd-A specificity. Instead the precise integration of Hox sites with additional TF inputs is required for accurate transcriptional output. Taken together, these studies provide new insight into the mechanisms of Hox target and regulatory specificity as well as the constraints placed on regulatory elements to convey appropriate outputs.Author SummaryThe Hox genes encode a family of transcription factors that give cells within each region along the developing body plan a unique identity in animals from worms to mammals. Surprisingly, however, most of the Hox factors bind the same or highly similar DNA sequences. These findings raise a paradox: How can proteins that have highly similar DNA binding properties perform different functions in the animal by regulating different sets of target genes? In this study, we address this question by studying how two Hox factors regulate the expression of target genes that specify leg development and the making of liver-like cells in the developing fly. By comparing and contrasting how Hox target genes are activated and/or repressed, we found that the same Hox binding sites can mediate either activation or repression in a manner that depends upon context. In addition, we found that a Hox binding site that is normally regulated by only one Hox factor, can also be used by more than one Hox factor swapped into another target gene. These findings indicate that the specificity of a Hox factor to regulate target genes does not rely solely upon DNA binding specificity but also requires regulatory specificity.


1991 ◽  
Vol 11 (3) ◽  
pp. 1488-1499
Author(s):  
H J Roth ◽  
G C Das ◽  
J Piatigorsky

Expression of the chicken beta B1-crystallin gene was examined. Northern (RNA) blot and primer extension analyses showed that while abundant in the lens, the beta B1 mRNA is absent from the liver, brain, heart, skeletal muscle, and fibroblasts of the chicken embryo, suggesting lens specificity. Promoter fragments ranging from 434 to 126 bp of 5'-flanking sequence (plus 30 bp of exon 1) of the beta B1 gene fused to the bacterial chloramphenicol acetyltransferase gene functioned much more efficiently in transfected embryonic chicken lens epithelial cells than in transfected primary muscle fibroblasts or HeLa cells. Transient expression of recombinant plasmids in cultured lens cells, DNase I footprinting, in vitro transcription in a HeLa cell extract, and gel mobility shift assays were used to identify putative functional promoter elements of the beta B1-crystallin gene. Sequence analysis revealed a number of potential regulatory elements between positions -126 and -53 of the beta B1 promoter, including two Sp1 sites, two octamer binding sequence-like sites (OL-1 and OL-2), and two polyomavirus enhancer-like sites (PL-1 and PL-2). Deletion and site-specific mutation experiments established the functional importance of PL-1 (-116 to -102), PL-2 (-90 to -76), and OL-2 (-75 to -68). DNase I footprinting using a lens or a HeLa cell nuclear extract and gel mobility shifts using a lens nuclear extract indicated the presence of putative lens transcription factors binding to these DNA sequences. Competition experiments provided evidence that PL-1 and PL-2 recognize the same or very similar factors, while OL-2 recognizes a different factor. Our data suggest that the same or closely related transcription factors found in many tissues are used for expression of the chicken beta B1-crystallin gene in the lens.


2019 ◽  
Vol 20 (24) ◽  
pp. 6324 ◽  
Author(s):  
Hironori Hojo ◽  
Shinsuke Ohba

Chondrogenesis is a key developmental process that molds the framework of our body and generates the skeletal tissues by coupling with osteogenesis. The developmental processes are well-coordinated by spatiotemporal gene expressions, which are hardwired with gene regulatory elements. Those elements exist as thousands of modules of DNA sequences on the genome. Transcription factors function as key regulatory proteins by binding to regulatory elements and recruiting cofactors. Over the past 30 years, extensive attempts have been made to identify gene regulatory mechanisms in chondrogenesis, mainly through biochemical approaches and genetics. More recently, newly developed next-generation sequencers (NGS) have identified thousands of gene regulatory elements on a genome scale, and provided novel insights into the multiple layers of gene regulatory mechanisms, including the modes of actions of transcription factors, post-translational histone modifications, chromatin accessibility, the concept of pioneer factors, and three-dimensional chromatin architecture. In this review, we summarize the studies that have improved our understanding of the gene regulatory mechanisms in chondrogenesis, from the historical studies to the more recent works using NGS. Finally, we consider the future perspectives, including efforts to improve our understanding of the gene regulatory landscape in chondrogenesis and potential applications to the treatment of chondrocyte-related diseases.


2018 ◽  
Author(s):  
Dikla Cohn ◽  
Or Zuk ◽  
Tommy Kaplan

AbstractEnhancer sequences regulate the expression of genes from afar by providing a binding platform for transcription factors, often in a tissue-specific or context-specific manner. Despite their importance in health and disease, our understanding of these DNA sequences, and their regulatory grammar, is limited. This impairs our ability to identify new enhancers along the genome, or to understand the effect of enhancer mutations and their role in genetic diseases.We trained deep Convolutional Neural Networks (CNN) to identify enhancer sequences in multiple species. We used multiple biological datasets, including simulated sequences, in vivo binding data of single transcription factors and genome-wide chromatin maps of active enhancers in 17 mammalian species. Our deep networks obtained high classification accuracy by combining two training strategies: First, training on enhancers vs. non-enhancer background sequences, we identified short (1-4bp) low-complexity motifs. Second, by replacing the negative training set by adversarial k-order random shuffles of enhancer sequences (thus maintaining base composition while shuttering longer motifs, including transcription factor binding sites), we identified a set of biologically meaningful motifs, unique to enhancers. In addition, classification performance improved when combining positive data from all species together, showing a shared mammalian regulatory architecture.Our results demonstrate that design of adversarial training data, and transfer of learned parameters between networks trained on different species/datasets improve the overall performance and capture biologically meaningful information in the parameters of the learned network.Contact:[email protected], [email protected]


2018 ◽  
Author(s):  
Benjamin T. James ◽  
Hani Z. Girgis

ABSTRACTGrouping sequences into similar clusters is an important part of sequence analysis. Widely used clustering tools sacrifice quality for speed. Previously, we developed MeShClust, which utilizes k-mer counts in an alignment-assisted classifier and the mean-shift algorithm for clustering DNA sequences. Although MeShClust outperformed related tools in terms of cluster quality, the alignment algorithm used for generating training data for the classifier was not scalable to longer sequences. In contrast, MeShClust2 generates semi-synthetic sequence pairs with known mutation rates, avoiding alignment algorithms. MeShClust2clustered 3600 bacterial genomes, providing a utility for clustering long sequences using identity scores for the first time.


Author(s):  
Alex M. Tseng ◽  
Avanti Shrikumar ◽  
Anshul Kundaje

AbstractDeep learning models can accurately map genomic DNA sequences to associated functional molecular readouts such as protein–DNA binding data. Base-resolution importance (i.e. “attribution”) scores inferred from these models can highlight predictive sequence motifs and syntax. Unfortunately, these models are prone to overfitting and are sensitive to random initializations, often resulting in noisy and irreproducible attributions that obfuscate underlying motifs. To address these shortcomings, we propose a novel attribution prior, where the Fourier transform of input-level attribution scores are computed at training-time, and high-frequency components of the Fourier spectrum are penalized. We evaluate different model architectures with and without attribution priors trained on genome-wide binary or continuous molecular profiles. We show that our attribution prior dramatically improves models’ stability, interpretability, and performance on held-out data, especially when training data is severely limited. Our attribution prior also allows models to identify biologically meaningful sequence motifs more sensitively and precisely within individual regulatory elements. The prior is agnostic to the model architecture or predicted experimental assay, yet provides similar gains across all experiments. This work represents an important advancement in improving the reliability of deep learning models for deciphering the regulatory code of the genome.


Sign in / Sign up

Export Citation Format

Share Document