scholarly journals SiteOut: an online tool to design binding site-free DNA sequences

2015 ◽  
Author(s):  
Javier Estrada ◽  
Teresa Ruiz-Herrero ◽  
Clarissa Scholes ◽  
Zeba Wunderlich ◽  
Angela DePace

DNA-binding proteins control many fundamental biological processes such as transcription, recombination and replication. A major goal is to decipher the role that DNA sequence plays in orchestrating the binding and activity of such regulatory proteins. To address this goal, it is useful to rationally design DNA sequences with desired numbers, affinities and arrangements of protein binding sites. However, removing binding sites from DNA is computationally non-trivial since one risks creating new sites in the process of deleting or moving others. Here we present an online binding site removal tool, SiteOut, that enables users to design arbitrary DNA sequences that entirely lack binding sites for factors of interest. SiteOut can also be used to delete sites from a specific sequence, or to introduce site-free spacers between functional sequences without creating new sites at the junctions. In combination with commercial DNA synthesis services, SiteOut provides a powerful and flexible platform for synthetic projects that interrogate regulatory DNA. Here we describe the algorithm and illustrate the ways in which SiteOut can be used; it is publicly available at https://depace.med.harvard.edu/siteout/

1994 ◽  
Vol 14 (9) ◽  
pp. 5986-5996
Author(s):  
S P Hunger ◽  
R Brown ◽  
M L Cleary

The t(17;19) translocation in acute lymphoblastic leukemias results in creation of E2A-hepatic leukemia factor (HLF) chimeric proteins that contain the DNA-binding and protein dimerization domains of the basic leucine zipper (bZIP) protein HLF fused to a portion of E2A proteins with transcriptional activation properties. An in vitro binding site selection procedure was used to determine DNA sequences preferentially bound by wild-type HLF and chimeric E2A-HLF proteins isolated from various t(17;19)-bearing leukemias. All were found to selectively bind the consensus sequence 5'-GTTACGTAAT-3' with high affinity. Wild-type and chimeric HLF proteins also bound closely related sites identified previously for bZIP proteins of both the proline- and acidic amino acid-rich (PAR) and C/EBP subfamilies; however, E2A-HLF proteins were significantly less tolerant of certain deviations from the HLF consensus binding site. These differences were directly attributable to loss of an HLF ancillary DNA-binding domain in all E2A-HLF chimeras and were further exacerbated by a zipper mutation in one isolate. Both wild-type and chimeric HLF proteins displayed transcriptional activator properties in lymphoid and nonlymphoid cells on reporter genes containing HLF or C/EBP consensus binding sites. But on reporter genes with nonoptimal binding sites, their transcriptional properties diverged and E2A-HLF competitively inhibited activation by wild-type PAR proteins. These findings establish a spectrum of binding site-specific transcriptional properties for E2A-HLF which may preferentially activate expression of select subordinate genes as a homodimer and potentially antagonize expression of others through heteromeric interactions.


1994 ◽  
Vol 14 (9) ◽  
pp. 5986-5996 ◽  
Author(s):  
S P Hunger ◽  
R Brown ◽  
M L Cleary

The t(17;19) translocation in acute lymphoblastic leukemias results in creation of E2A-hepatic leukemia factor (HLF) chimeric proteins that contain the DNA-binding and protein dimerization domains of the basic leucine zipper (bZIP) protein HLF fused to a portion of E2A proteins with transcriptional activation properties. An in vitro binding site selection procedure was used to determine DNA sequences preferentially bound by wild-type HLF and chimeric E2A-HLF proteins isolated from various t(17;19)-bearing leukemias. All were found to selectively bind the consensus sequence 5'-GTTACGTAAT-3' with high affinity. Wild-type and chimeric HLF proteins also bound closely related sites identified previously for bZIP proteins of both the proline- and acidic amino acid-rich (PAR) and C/EBP subfamilies; however, E2A-HLF proteins were significantly less tolerant of certain deviations from the HLF consensus binding site. These differences were directly attributable to loss of an HLF ancillary DNA-binding domain in all E2A-HLF chimeras and were further exacerbated by a zipper mutation in one isolate. Both wild-type and chimeric HLF proteins displayed transcriptional activator properties in lymphoid and nonlymphoid cells on reporter genes containing HLF or C/EBP consensus binding sites. But on reporter genes with nonoptimal binding sites, their transcriptional properties diverged and E2A-HLF competitively inhibited activation by wild-type PAR proteins. These findings establish a spectrum of binding site-specific transcriptional properties for E2A-HLF which may preferentially activate expression of select subordinate genes as a homodimer and potentially antagonize expression of others through heteromeric interactions.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiaoyong Pan ◽  
Yi Fang ◽  
Xianfeng Li ◽  
Yang Yang ◽  
Hong-Bin Shen

Abstract Background RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive. Results Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs. For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS. For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP. RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs. RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence. Conclusions RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/.


2000 ◽  
Vol 20 (1) ◽  
pp. 389-401 ◽  
Author(s):  
Elisabetta Soldaini ◽  
Susan John ◽  
Stefano Moro ◽  
Julie Bollenbacher ◽  
Ulrike Schindler ◽  
...  

ABSTRACT We have defined the optimal binding sites for Stat5a and Stat5b homodimers and found that they share similar core TTC(T/C)N(G/A)GAA interferon gamma-activated sequence (GAS) motifs. Stat5a tetramers can bind to tandemly linked GAS motifs, but the binding site selection revealed that tetrameric binding also can be seen with a wide range of nonconsensus motifs, which in many cases did not allow Stat5a binding as a dimer. This indicates a greater degree of flexibility in the DNA sequences that allow binding of Stat5a tetramers than dimers. Indeed, in an oligonucleotide that could bind both dimers and tetramers, it was possible to design mutants that affected dimer binding without affecting tetramer binding. A spacing of 6 bp between the GAS sites was most frequently selected, demonstrating that this distance is favorable for Stat5a tetramer binding. These data provide insights into tetramer formation by Stat5a and indicate that the repertoire of potential binding sites for this transcription factor is broader than expected.


2018 ◽  
Author(s):  
Nathalie Lagarde ◽  
Alessandra Carbone ◽  
Sophie Sacquin-Mora

AbstractProtein-protein interactions control a large range of biological processes and their identification is essential to understand the underlying biological mechanisms. To complement experimental approaches, in silico methods are available to investigate protein-protein interactions. Cross-docking methods, in particular, can be used to predict protein binding sites. However, proteins can interact with numerous partners and can present multiple binding sites on their surface, which may alter the binding site prediction quality. We evaluate the binding site predictions obtained using complete cross-docking simulations of 358 proteins with two different scoring schemes accounting for multiple binding sites. Despite overall good binding site prediction performances, 68 cases were still associated with very low prediction quality, presenting individual area under the specificity-sensitivity ROC curve (AUC) values below the random AUC threshold of 0.5, since cross-docking calculations can lead to the identification of alternate protein binding sites (that are different from the reference experimental sites). For the large majority of these proteins, we show that the predicted alternate binding sites correspond to interaction sites with hidden partners, i.e. partners not included in the original cross-docking dataset. Among those new partners, we find proteins, but also nucleic acid molecules. Finally, for proteins with multiple binding sites on their surface, we investigated the structural determinants associated with the binding sites the most targeted by the docking partners.AbbreviationsANOVA: ANalysis Of Variance; AUC: Area Under the Curve; Best Interface: BI; CAPRI: Critical Assessment of Prediction of Interactions; CC-D: Complete Cross-Docking; DNA: DesoxyriboNucleic Acid; FDR: False Discovery Rate; FRIres(type): Fraction of each Residue type in the Interface; FP: False Positives; GI: Global Interface; HCMD: Help Cure Muscular Dystrophy; JET: Joint Evolutionary Tree; MAXDo: Molecular Association via Cross Docking; NAI: Nucleic Acid Interface; NPV: Negative Predicted Value; PDB: Protein Data Bank; PIP: Protein Interface Propensity; PiQSi: Protein Quaternary Structure investigation; PPIs: Protein-Protein Interactions; PPV: Positive Predicted Value; Prec.: Precision; PrimI: Primary Interface; RNA: RiboNucleic Acid; ROC: Receiver Operating Characteristic; SecI: Secondary Interface; Sen.: Sensitivity; Spe.: Specificity; TN: True Negatives; TP: True Positives; WCG: World Community Grid.


Author(s):  
Igor Kozlovskii ◽  
Petr Popov

Identification of novel protein binding sites expands «druggable genome» and opens new opportunities for drug discovery. Generally, presence or absence of a binding site depends on the three-dimensional conformation of a protein, making binding site identification resemble to object detection problem in computer vision. Here we introduce a computational approach for the large-scale detection of protein binding sites, named BiteNet, that considers protein conformations as the 3D-images, binding sites as the objects on these images to detect, and conformational ensembles of proteins as the 3D-videos to analyze. BiteNet is suitable for spatiotemporal detection of hard-to-spot allosteric binding sites, as we showed for conformation-specific binding site of the epidermal growth factor receptor, oligomer-specific binding site of the ion channel, and binding sites in G protein-coupled receptors. BiteNet outperforms state-of-the-art methods both in terms of accuracy and speed, taking about 1.5 minute to analyze 1000 conformations of a protein with 2000 atoms. BiteNet is available at https://github.com/i-Molecule/bitenet.


2019 ◽  
Author(s):  
Shubhada R. Kulkarni ◽  
D. Marc Jones ◽  
Klaas Vandepoele

ABSTRACTDetermining where transcription factors (TF) bind in genomes provides insights into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these datasets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF binding preferences, or motifs, to identify TF binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here we compare four motif mapping tools widely used in the Arabidopsis research community and evaluate their performance using chromatin immunoprecipitation datasets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of FIMO, one of the most frequently used motif mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a new protocol is presented to efficiently derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression datasets.


2019 ◽  
Author(s):  
Edmond R. Watson ◽  
Christy R. R. Grace ◽  
Wei Zhang ◽  
Darcie J. Miller ◽  
Iain F. Davidson ◽  
...  

ABSTRACTUbiquitin-mediated proteolysis is a fundamental mechanism used by eukaryotic cells to maintain homeostasis and protein quality, and to control timing in biological processes. Two essential aspects of ubiquitin regulation are conjugation through E1-E2-E3 enzymatic cascades, and recognition by ubiquitin-binding domains. An emerging theme in the ubiquitin field is that these two properties are often amalgamated in conjugation enzymes. In addition to covalent thioester linkage to ubiquitin’s C-terminus for ubiquitin transfer reactions, conjugation enzymes often bind non-covalently and weakly to ubiquitin at “exosites”. However, identification of such sites is typically empirical and particularly challenging in large molecular machines. Here, studying the 1.2 MDa E3 ligase Anaphase-Promoting Complex/Cyclosome (APC/C), which controls cell division and many aspects of neurobiology, we discover a method for identifying unexpected ubiquitin-binding sites. Using a panel of ubiquitin variants (UbVs) we identify a protein-based inhibitor that blocks ubiquitin ligation to APC/C substrates in vitro and ex vivo. Biochemistry, NMR, and cryo EM structurally define the UbV interaction, explain its inhibitory activity through binding the surface on the APC2 subunit that recruits the E2 enzyme UBE2C, and ultimately reveal that this APC2 surface is also a ubiquitin-binding exosite with preference for K48-linked chains. The results provide a new tool for probing APC/C activity, have implications for the coordination of K48-linked Ub chain binding by APC/C with the multistep process of substrate polyubiquitylation, and demonstrate the power of UbV technology for identifying cryptic ubiquitin binding sites within large multiprotein complexes.SIGNIFICANCE STATEMENTUbiquitin-mediated interactions influence numerous biological processes. These are often transient or a part of multivalent interactions. Therefore, unmasking these interactions remains a significant challenge for large, complicated enzymes such as the Anaphase-Promoting Complex/Cyclosome (APC/C), a multisubunit RING E3 ubiquitin (Ub) ligase. APC/C activity regulates numerous facets of biology by targeting key regulatory proteins for Ub-mediated degradation. Using a series of Ub variants (UbVs), we identified a new Ub-binding site on the APC/C that preferentially binds to K48-linked Ub chains. More broadly, we demonstrate a workflow that can be exploited to uncover Ub-binding sites within ubiquitylation machinery and other associated regulatory proteins to interrogate the complexity of the Ub code in biology.


2021 ◽  
Author(s):  
Chen Chen ◽  
Jie Hou ◽  
Xiaowen Shi ◽  
Hua Yang ◽  
James A. Birchler ◽  
...  

Abstract BackgroundDue to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors.ResultsIn this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN.ConclusionsDeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


Sign in / Sign up

Export Citation Format

Share Document