scholarly journals DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

2017 ◽  
Author(s):  
Hamid Reza Hassanzadeh ◽  
May D. Wang

AbstractTranscription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.


1989 ◽  
Vol 9 (6) ◽  
pp. 2464-2476
Author(s):  
M Cockell ◽  
B J Stevenson ◽  
M Strubin ◽  
O Hagenbüchle ◽  
P K Wellauer

Footprint analysis of the 5'-flanking regions of the alpha-amylase 2, elastase 2, and trypsina genes, which are expressed in the acinar pancreas, showed multiple sites of protein-DNA interaction for each gene. Competition experiments demonstrated that a region from each 5'-flanking region interacted with the same cell-specific DNA-binding activity. We show by in vitro binding assays that this DNA-binding activity also recognizes a sequence within the 5'-flanking regions of elastase 1, chymotrypsinogen B, carboxypeptidase A, and trypsind genes. Methylation interference and protection studies showed that the DNA-binding activity recognized a bipartite motif, the subelements of which were separated by integral helical turns of DNA. The alpha-amylase 2 cognate sequence was found to enhance in vivo transcription of its own promoter in a cell-specific manner, which identified the DNA-binding activity as a transcription factor (PTF 1). The observation that PTF 1 bound to DNA sequences that have been defined as transcriptional enhancers by others suggests that this factor is involved in the coordinate expression of genes transcribed in the acinar pancreas.



1989 ◽  
Vol 9 (6) ◽  
pp. 2464-2476 ◽  
Author(s):  
M Cockell ◽  
B J Stevenson ◽  
M Strubin ◽  
O Hagenbüchle ◽  
P K Wellauer

Footprint analysis of the 5'-flanking regions of the alpha-amylase 2, elastase 2, and trypsina genes, which are expressed in the acinar pancreas, showed multiple sites of protein-DNA interaction for each gene. Competition experiments demonstrated that a region from each 5'-flanking region interacted with the same cell-specific DNA-binding activity. We show by in vitro binding assays that this DNA-binding activity also recognizes a sequence within the 5'-flanking regions of elastase 1, chymotrypsinogen B, carboxypeptidase A, and trypsind genes. Methylation interference and protection studies showed that the DNA-binding activity recognized a bipartite motif, the subelements of which were separated by integral helical turns of DNA. The alpha-amylase 2 cognate sequence was found to enhance in vivo transcription of its own promoter in a cell-specific manner, which identified the DNA-binding activity as a transcription factor (PTF 1). The observation that PTF 1 bound to DNA sequences that have been defined as transcriptional enhancers by others suggests that this factor is involved in the coordinate expression of genes transcribed in the acinar pancreas.



2000 ◽  
Vol 20 (15) ◽  
pp. 5540-5553 ◽  
Author(s):  
Yue Liu ◽  
April L. Colosimo ◽  
Xiang-Jiao Yang ◽  
Daiqing Liao

ABSTRACT The adenovirus E1B 55-kDa protein binds to cellular tumor suppressor p53 and inactivates its transcriptional transactivation function. p53 transactivation activity is dependent upon its ability to bind to specific DNA sequences near the promoters of its target genes. It was shown recently that p53 is acetylated by transcriptional coactivators p300, CREB bidning protein (CBP), and PCAF and that acetylation of p53 by these proteins enhances p53 sequence-specific DNA binding. Here we show that the E1B 55-kDa protein specifically inhibits p53 acetylation by PCAF in vivo and in vitro, while acetylation of histones and PCAF autoacetylation is not affected. Furthermore, the DNA-binding activity of p53 is diminished in cells expressing the E1B 55-kDa protein. PCAF binds to the E1B 55-kDa protein and to a region near the C terminus of p53 encompassing Lys-320, the specific PCAF acetylation site. We further show that the E1B 55-kDa protein interferes with the physical interaction between PCAF and p53, suggesting that the E1B 55-kDa protein inhibits PCAF acetylase function on p53 by preventing enzyme-substrate interaction. These results underscore the importance of p53 acetylation for its function and suggest that inhibition of p53 acetylation by viral oncoproteins prevent its activation, thereby contributing to viral transformation.



Genetics ◽  
1994 ◽  
Vol 137 (3) ◽  
pp. 715-722 ◽  
Author(s):  
M L Philley ◽  
C Staben

Abstract The Neurospora crassa mt a-1 gene, encoding the MT a-1 polypeptide, determines a mating type properties: sexual compatibility and vegetative incompatibility with A mating type. We characterized in vivo and in vitro functions of the MT a-1 polypeptide and specific mutant derivatives. MT a-1 polypeptide produced in Escherichia coli bound to specific DNA sequences whose core was 5'-CTTTG-3'. DNA binding was a function of the MT a-1 HMG box domain (a DNA binding motif found in high mobility group proteins and a diverse set of regulatory proteins). Mutation within the HMG box eliminated DNA binding in vitro and eliminated mating in vivo, but did not interfere with vegetative incompatibility function in vivo. Conversely, deletion of amino acids 216-220 of MT a-1 eliminated vegetative incompatibility, but did not affect mating or DNA binding. Deletion of the carboxyl terminal half of MT a-1 eliminated both mating and vegetative incompatibility in vivo, but not DNA binding in vitro. These results suggest that mating depends upon the ability of MT a-1 polypeptide to bind to, and presumably to regulate the activity of, specific DNA sequences. However, the separation of vegetative incompatibility from both mating and DNA binding indicates that vegetative incompatibility functions by a biochemically distinct mechanism.



2019 ◽  
Vol 47 (19) ◽  
pp. 9967-9989 ◽  
Author(s):  
Maria Carmen Mulero ◽  
Vivien Ya-Fan Wang ◽  
Tom Huxford ◽  
Gourisankar Ghosh

Abstract The NF-κB family of dimeric transcription factors regulates transcription by selectively binding to DNA response elements present within promoters or enhancers of target genes. The DNA response elements, collectively known as κB sites or κB DNA, share the consensus 5′-GGGRNNNYCC-3′ (where R, Y and N are purine, pyrimidine and any nucleotide base, respectively). In addition, several DNA sequences that deviate significantly from the consensus have been shown to accommodate binding by NF-κB dimers. X-ray crystal structures of NF-κB in complex with diverse κB DNA have helped elucidate the chemical principles that underlie target selection in vitro. However, NF-κB dimers encounter additional impediments to selective DNA binding in vivo. Work carried out during the past decades has identified some of the barriers to sequence selective DNA target binding within the context of chromatin and suggests possible mechanisms by which NF-κB might overcome these obstacles. In this review, we first highlight structural features of NF-κB:DNA complexes and how distinctive features of NF-κB proteins and DNA sequences contribute to specific complex formation. We then discuss how native NF-κB dimers identify DNA binding targets in the nucleus with support from additional factors and how post-translational modifications enable NF-κB to selectively bind κB sites in vivo.



Development ◽  
1996 ◽  
Vol 122 (9) ◽  
pp. 2639-2650 ◽  
Author(s):  
S. Jun ◽  
C. Desplan

The Pax proteins are a family of transcriptional regulators involved in many developmental processes in all higher eukaryotes. They are characterized by the presence of a paired domain (PD), a bipartite DNA binding domain composed of two helix-turn-helix (HTH) motifs, the PAI and RED domains. The PD is also often associated with a homeodomain (HD) which is itself able to form homo- and hetero-dimers on DNA. Many of these proteins therefore contain three HTH motifs each able to recognize DNA. However, all PDs recognize highly related DNA sequences, and most HDs also recognize almost identical sites. We show here that different Pax proteins use multiple combinations of their HTHs to recognize several types of target sites. For instance, the Drosophila Paired protein can bind, in vitro, exclusively through its PAI domain, or through a dimer of its HD, or through cooperative interaction between PAI domain and HD. However, prd function in vivo requires the synergistic action of both the PAI domain and the HD. Pax proteins with only a PD appear to require both PAI and RED domains, while a Pax-6 isoform and a new Pax protein, Lune, may rely on the RED domain and HD. We propose a model by which Pax proteins recognize different target genes in vivo through various combinations of their DNA binding domains, thus expanding their recognition repertoire.



1997 ◽  
Vol 17 (10) ◽  
pp. 5679-5687 ◽  
Author(s):  
C P Chang ◽  
Y Jacobs ◽  
T Nakamura ◽  
N A Jenkins ◽  
N G Copeland ◽  
...  

The Pbx1 and Meis1 proto-oncogenes code for divergent homeodomain proteins that are targets for oncogenic mutations in human and murine leukemias, respectively, and implicated by genetic analyses to functionally collaborate with Hox proteins during embryonic development and/or oncogenesis. Although Pbx proteins have been shown to dimerize with Hox proteins and modulate their DNA binding properties in vitro, the biochemical compositions of endogenous Pbx-containing complexes have not been determined. In the present study, we demonstrate that Pbx and Meis proteins form abundant complexes that comprise a major Pbx-containing DNA binding activity in nuclear extracts of cultured cells and mouse embryos. Pbx1 and Meis1 dimerize in solution and cooperatively bind bipartite DNA sequences consisting of directly adjacent Pbx and Meis half sites. Pbx1-Meis1 heterodimers display distinctive DNA binding specificities and cross-bind to a subset of Pbx-Hox sites, including those previously implicated as response elements for the execution of Pbx-dependent Hox programs in vivo. Chimeric oncoprotein E2a-Pbx1 is unable to bind DNA with Meis1, due to the deletion of amino-terminal Pbx1 sequences following fusion with E2a. We conclude that Meis proteins are preferred in vivo DNA binding partners for wild-type Pbx1, a relationship that is circumvented by its oncogenic counterpart E2a-Pbx1.



PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243905
Author(s):  
Paul B. Finn ◽  
Devesh Bhimsaria ◽  
Asfa Ali ◽  
Asuka Eguchi ◽  
Aseem Z. Ansari ◽  
...  

Pyrrole–imidazole (Py–Im) polyamides are synthetic molecules that can be rationally designed to target specific DNA sequences to both disrupt and recruit transcriptional machinery. While in vitro binding has been extensively studied, in vivo effects are often difficult to predict using current models of DNA binding. Determining the impact of genomic architecture and the local chromatin landscape on polyamide-DNA sequence specificity remains an unresolved question that impedes their effective deployment in vivo. In this report we identified polyamide–DNA interaction sites across the entire genome, by covalently crosslinking and capturing these events in the nuclei of human LNCaP cells. This technique confirms the ability of two eight ring hairpin-polyamides, with similar architectures but differing at a single ring position (Py to Im), to retain in vitro specificities and display distinct genome-wide binding profiles.



2017 ◽  
Author(s):  
Hamid Reza Hassanzadeh ◽  
Pushkar Kolhe ◽  
Charles L. Isbell ◽  
May D. Wang

AbstractThe interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.



Development ◽  
1999 ◽  
Vol 126 (5) ◽  
pp. 873-881 ◽  
Author(s):  
W. Yi ◽  
D. Zarkower

Although most animals occur in two sexes, the molecular pathways they employ to control sexual development vary considerably. The only known molecular similarity between phyla in sex determination is between two genes, mab-3 from C. elegans, and doublesex (dsx) from Drosophila. Both genes contain a DNA binding motif called a DM domain and they regulate similar aspects of sexual development, including yolk protein synthesis and peripheral nervous system differentiation. Here we show that MAB-3, like the DSX proteins, is a direct regulator of yolk protein gene transcription. We show that despite containing different numbers of DM domains MAB-3 and DSX bind to similar DNA sequences. mab-3 mutations deregulate vitellogenin synthesis at the level of transcription, resulting in expression in both sexes, and the vitellogenin genes have potential MAB-3 binding sites upstream of their transcriptional start sites. MAB-3 binds to a site in the vit-2 promoter in vitro, and this site is required in vivo to prevent transcription of a vit-2 reporter construct in males, suggesting that MAB-3 is a direct repressor of vitellogenin transcription. This is the first direct link between the sex determination regulatory pathway and sex-specific structural genes in C. elegans, and it suggests that nematodes and insects use at least some of the same mechanisms to control sexual development.



Sign in / Sign up

Export Citation Format

Share Document