scholarly journals Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data

2016 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Dianhui Wang

Understanding protein-DNA binding affinity is still a mystery for many transcription factors (TFs). Although several approaches have been proposed in the literature to model the DNA-binding specificity of TFs, they still have some limitations. Most of the methods require a cut-off threshold in order to classify a K-mer as a binding site (BS) and finding such a threshold is usually done by handcraft rather than a science. Some other approaches use a prior knowledge on the biological context of regulatory elements in the genome along with machine learning algorithms to build classifier models for TFBSs. Noticeably, these methods deliberately select the training and testing datasets so that they are very separable. Hence, the current methods do not actually capture the TF-DNA binding relationship. In this paper, we present a threshold-free framework based on a novel ensemble learning algorithm in order to locate TFBSs in DNA sequences. Our proposed approach creates TF-specific classifier models using genome-wide DNA-binding experiments and a prior biological knowledge on DNA sequences and TF binding preferences. Systematic background filtering algorithms are utilized to remove non-functional K-mers from training and testing datasets. To reduce the complexity of classifier models, a fast feature selection algorithm is employed. Finally, the created classifier models are used to scan new DNA sequences and identify potential binding sites. The analysis results show that our proposed approach is able to identify novel binding sites in the Saccharomyces cerevisiae [email protected], [email protected]://homepage.cs.latrobe.edu.au/dwang/DNNESCANweb

1994 ◽  
Vol 14 (5) ◽  
pp. 3469-3483 ◽  
Author(s):  
I J Davis ◽  
L F Lau

nurr77 and nurr-1 are growth factor-inducible members of the steroid/thyroid hormone receptor gene superfamily. In order to gain insight into the potential roles of nur77 in the living organism, we used pharmacologic treatments to examine the expression of nur77 in the mouse adrenal gland. We found that nur77 and nurr-1 are induced in the adrenal gland upon treatment with pentylene tetrazole (Ptz; Metrazole). This induction is separable into distinct endocrine and neurogenic mechanisms. In situ hybridization analysis demonstrates that nur77 expression upon Ptz treatment in the adrenal cortex is localized primarily to the inner cortical region, the zona fasciculata-reticularis, with minimal induction in the zona glomerulosa. This induction is inhibitable by pretreatment with dexamethasone, indicating involvement of the hypothalamic-pituitary-adrenal axis in the activation of adrenal cortical expression. When mice were injected with adrenocorticotrophic hormone (ACTH), nur77 expression in the adrenal gland spanned all cortical layers including the zona glomerulosa, but medullary expression was not induced. Ptz also induces expression of both nur77 and nurr-1 in the adrenal medulla. Medullary induction is likely to have a neurogenic origin, as nur77 expression was not inhibitable by dexamethasone pretreatment and induction was seen after treatment with the cholinergic neurotransmitter nicotine. nur77 is also inducible by ACTH, forskolin, and the second messenger analog dibutyryl cyclic AMP in the ACTH-responsive adrenal cortical cell line Y-1. Significantly, Nur77 isolated from ACTH-stimulated Y-1 cells bound to its response element whereas Nur77 present in unstimulated cells did not. Moreover, Nur77 in ACTH-treated Y-1 cells was hypophosphorylated at serine 354 compared with that in untreated cells. These results, taken together with the previous observation that dephosphorylation of serine 354 affects DNA binding affinity in vitro, show for the first time that phosphorylation of Nur77 at serine 354 is under hormonal regulation, modulating its DNA binding affinity. Thus, ACTH regulates Nur77 in two ways: activation of its gene and posttranslational modification. A promoter analysis of nur77 induction in Y-1 cells indicates that the regulatory elements mediating ACTH induction differ from those required for induction in the adrenal medullary tumor cell line PC12 and in 3T3 fibroblasts.


2018 ◽  
Author(s):  
Arya Zandvakili ◽  
Juli Uhl ◽  
Ian Campbell ◽  
Yuntao Charlie Song ◽  
Brian Gebelein

AbstractHox genes encode a family of transcription factors that, despite having similar in vitro DNA binding preferences, regulate distinct genetic programs along the metazoan anterior-posterior axis. To better define mechanisms of Hox specificity, we compared and contrasted the ability of abdominal Hox factors to regulate two cis-regulatory elements within the Drosophila embryo. Both the Ultrabithorax (Ubx) and Abdominal-A (Abd-A) Hox factors form cooperative complexes with the Extradenticle (Exd) and Homothorax (Hth) transcription factors to repress the distal-less leg selector gene via the DCRE, whereas only Abd-A interacts with Exd and Hth on the RhoA element to activate a rhomboid serine protease gene that stimulates Epidermal Growth Factor secretion. By swapping binding sites between these elements, we found that the RhoA Exd/Hth/Hox site configuration that mediates Abd-A specific activation can also convey transcriptional repression by both Ubx and Abd-A when placed into the DCRE, but only in one orientation. We further show that the orientation and spacing of Hox sites relative to additional transcription factor binding sites within the RhoA and DCRE elements is critical to mediate appropriate cell- and segment-specific output. These results indicate that the interaction between Hox, Exd, and Hth neither determines activation vs repression specificity nor defines Ubx vs Abd-A specificity. Instead the precise integration of Hox sites with additional TF inputs is required for accurate transcriptional output. Taken together, these studies provide new insight into the mechanisms of Hox target and regulatory specificity as well as the constraints placed on regulatory elements to convey appropriate outputs.Author SummaryThe Hox genes encode a family of transcription factors that give cells within each region along the developing body plan a unique identity in animals from worms to mammals. Surprisingly, however, most of the Hox factors bind the same or highly similar DNA sequences. These findings raise a paradox: How can proteins that have highly similar DNA binding properties perform different functions in the animal by regulating different sets of target genes? In this study, we address this question by studying how two Hox factors regulate the expression of target genes that specify leg development and the making of liver-like cells in the developing fly. By comparing and contrasting how Hox target genes are activated and/or repressed, we found that the same Hox binding sites can mediate either activation or repression in a manner that depends upon context. In addition, we found that a Hox binding site that is normally regulated by only one Hox factor, can also be used by more than one Hox factor swapped into another target gene. These findings indicate that the specificity of a Hox factor to regulate target genes does not rely solely upon DNA binding specificity but also requires regulatory specificity.


2019 ◽  
Author(s):  
Anvita Gupta ◽  
Anshul Kundaje

AbstractTargeted optimizing of existing DNA sequences for useful properties, has the potential to enable several synthetic biology applications from modifying DNA to treat genetic disorders to designing regulatory elements to fine tune context-specific gene expression. Current approaches for targeted genome editing are largely based on prior biological knowledge or ad-hoc rules. Few if any machine learning approaches exist for targeted optimization of regulatory DNA sequences.Here, we propose a novel generative neural network architecture for targeted DNA sequence editing – the EDA architecture – consisting of an encoder, decoder, and analyzer. We showcase the use of EDA to optimize regulatory DNA sequences to bind to the transcription factor SPI1. Compared to other state-of-the-art approaches such as a textual variational autoencoder and rule-based editing, EDA significantly improves predicted binding of SPI1 of genomic sequences with the minimal set of edits. We also use EDA to design regulatory elements with optimized grammars of CREB1 binding sites that can tune reporter expression levels as measured by massively parallel reporter assays (MPRA). We analyze the properties of the binding sites in the edited sequences and find patterns that are consistent with previously reported grammatical rules which tie gene expression to CRE binding site density, spacing and affinity.


1994 ◽  
Vol 14 (5) ◽  
pp. 3469-3483
Author(s):  
I J Davis ◽  
L F Lau

nurr77 and nurr-1 are growth factor-inducible members of the steroid/thyroid hormone receptor gene superfamily. In order to gain insight into the potential roles of nur77 in the living organism, we used pharmacologic treatments to examine the expression of nur77 in the mouse adrenal gland. We found that nur77 and nurr-1 are induced in the adrenal gland upon treatment with pentylene tetrazole (Ptz; Metrazole). This induction is separable into distinct endocrine and neurogenic mechanisms. In situ hybridization analysis demonstrates that nur77 expression upon Ptz treatment in the adrenal cortex is localized primarily to the inner cortical region, the zona fasciculata-reticularis, with minimal induction in the zona glomerulosa. This induction is inhibitable by pretreatment with dexamethasone, indicating involvement of the hypothalamic-pituitary-adrenal axis in the activation of adrenal cortical expression. When mice were injected with adrenocorticotrophic hormone (ACTH), nur77 expression in the adrenal gland spanned all cortical layers including the zona glomerulosa, but medullary expression was not induced. Ptz also induces expression of both nur77 and nurr-1 in the adrenal medulla. Medullary induction is likely to have a neurogenic origin, as nur77 expression was not inhibitable by dexamethasone pretreatment and induction was seen after treatment with the cholinergic neurotransmitter nicotine. nur77 is also inducible by ACTH, forskolin, and the second messenger analog dibutyryl cyclic AMP in the ACTH-responsive adrenal cortical cell line Y-1. Significantly, Nur77 isolated from ACTH-stimulated Y-1 cells bound to its response element whereas Nur77 present in unstimulated cells did not. Moreover, Nur77 in ACTH-treated Y-1 cells was hypophosphorylated at serine 354 compared with that in untreated cells. These results, taken together with the previous observation that dephosphorylation of serine 354 affects DNA binding affinity in vitro, show for the first time that phosphorylation of Nur77 at serine 354 is under hormonal regulation, modulating its DNA binding affinity. Thus, ACTH regulates Nur77 in two ways: activation of its gene and posttranslational modification. A promoter analysis of nur77 induction in Y-1 cells indicates that the regulatory elements mediating ACTH induction differ from those required for induction in the adrenal medullary tumor cell line PC12 and in 3T3 fibroblasts.


Biochemistry ◽  
2008 ◽  
Vol 47 (26) ◽  
pp. 6809-6818 ◽  
Author(s):  
Alpa Sidhu ◽  
Patrick J. Miller ◽  
Kelly E. Johanson ◽  
Andrew D. Hollenbach

Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Krystyna Ślaska-Kiss ◽  
Nikolett Zsibrita ◽  
Mihály Koncz ◽  
Pál Albert ◽  
Ákos Csábrádi ◽  
...  

AbstractTargeted DNA methylation is a technique that aims to methylate cytosines in selected genomic loci. In the most widely used approach a CG-specific DNA methyltransferase (MTase) is fused to a sequence specific DNA binding protein, which binds in the vicinity of the targeted CG site(s). Although the technique has high potential for studying the role of DNA methylation in higher eukaryotes, its usefulness is hampered by insufficient methylation specificity. One of the approaches proposed to suppress methylation at unwanted sites is to use MTase variants with reduced DNA binding affinity. In this work we investigated how methylation specificity of chimeric MTases containing variants of the CG-specific prokaryotic MTase M.SssI fused to zinc finger or dCas9 targeting domains is influenced by mutations affecting catalytic activity and/or DNA binding affinity of the MTase domain. Specificity of targeted DNA methylation was assayed in E. coli harboring a plasmid with the target site. Digestions of the isolated plasmids with methylation sensitive restriction enzymes revealed that specificity of targeted DNA methylation was dependent on the activity but not on the DNA binding affinity of the MTase. These results have implications for the design of strategies of targeted DNA methylation.


Sign in / Sign up

Export Citation Format

Share Document