scholarly journals DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method

2021 ◽  
Vol 22 (11) ◽  
pp. 5510
Author(s):  
Samuel Godfrey Hendrix ◽  
Kuan Y. Chang ◽  
Zeezoo Ryu ◽  
Zhong-Ru Xie

It is essential for future research to develop a new, reliable prediction method of DNA binding sites because DNA binding sites on DNA-binding proteins provide critical clues about protein function and drug discovery. However, the current prediction methods of DNA binding sites have relatively poor accuracy. Using 3D coordinates and the atom-type of surface protein atom as the input, we trained and tested a deep learning model to predict how likely a voxel on the protein surface is to be a DNA-binding site. Based on three different evaluation datasets, the results show that our model not only outperforms several previous methods on two commonly used datasets, but also demonstrates its robust performance to be consistent among the three datasets. The visualized prediction outcomes show that the binding sites are also mostly located in correct regions. We successfully built a deep learning model to predict the DNA binding sites on target proteins. It demonstrates that 3D protein structures plus atom-type information on protein surfaces can be used to predict the potential binding sites on a protein. This approach should be further extended to develop the binding sites of other important biological molecules.

RNA Biology ◽  
2018 ◽  
Vol 15 (12) ◽  
pp. 1468-1476 ◽  
Author(s):  
Fan Wang ◽  
Pranik Chainani ◽  
Tommy White ◽  
Jin Yang ◽  
Yu Liu ◽  
...  

2021 ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Xiayu Fu ◽  
Zeliang Hou ◽  
...  

Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer.


2021 ◽  
Author(s):  
Rishal Aggarwal ◽  
Akash Gupta ◽  
Vineeth Chelur ◽  
C. V. Jawahar ◽  
U. Deva Priyakumar

<div> A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilises 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another dataset SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 2018 till February 2020 for ligand binding site (LBS) detection. DeepPocket's results on various binding site datasets and SC6K highlights its better performance over current state-of-the-art methods and good generalization ability over novel structures. </div><div><br></div>


2021 ◽  
Author(s):  
Qianmu Yuan ◽  
Sheng Chen ◽  
Jiahua Rao ◽  
Shuangjia Zheng ◽  
Huiying Zhao ◽  
...  

AbstractMotivationProtein-DNA interactions play crucial roles in the biological systems, and identifying protein-DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information.ResultsBased on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformerbased variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm was further confirmed on the independent test set of 196 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 12.3% in AUPR and 9.3% in MCC, [email protected]


2021 ◽  
Author(s):  
Jérôme Tubiana ◽  
Dina Schneidman-Duhovny ◽  
Haim J. Wolfson

Predicting the functional sites of a protein from its structure, such as the binding sites of small molecules, other proteins or antibodies sheds light on its function in vivo. Currently, two classes of methods prevail: Machine Learning (ML) models built on top of handcrafted features and comparative modeling. They are respectively limited by the expressivity of the handcrafted features and the availability of similar proteins. Here, we introduce ScanNet, an end-to-end, interpretable geometric deep learning model that learns features directly from 3D structures. ScanNet builds representations of atoms and amino acids based on the spatio-chemical arrangement of their neighbors. We train ScanNet for detecting protein-protein and protein-antibody binding sites, demonstrate its accuracy - including for unseen protein folds - and interpret the filters learned. Finally, we predict epitopes of the SARS-CoV-2 spike protein, validating known antigenic regions and predicting previously uncharacterized ones. Overall, ScanNet is a versatile, powerful, and interpretable model suitable for functional site prediction tasks. A webserver for ScanNet is available from http://bioinfo3d.cs.tau.ac.il/ScanNet/


1994 ◽  
Vol 14 (11) ◽  
pp. 7592-7603
Author(s):  
P E Kroeger ◽  
R I Morimoto

Multiple heat shock transcription factors (HSFs) have been discovered in several higher eukaryotes, raising questions about their respective functions in the cellular stress response. Previously, we had demonstrated that the two mouse HSFs (mHSF1 and mHSF2) interacted differently with the HSP70 heat shock element (HSE). To further address the issues of cooperativity and the interaction of multiple HSFs with the HSE, we selected new mHSF1 and mHSF2 DNA-binding sites through protein binding and PCR amplification. The selected sequences, isolated from a random population, were composed primarily of alternating inverted arrays of the pentameric consensus 5'-nGAAn-3', and the nucleotides flanking the core GAA motif were nonrandom. The average number of pentamers selected in each binding site was four to five for mHSF1 and two to three for mHSF2, suggesting differences in the potential for cooperative interactions between adjacent trimers. Our comparison of mHSF1 and mHSF2 binding to selected sequences further substantiated these differences in cooperativity as mHSF1, unlike mHSF2, was able to bind to extended HSE sequences, confirming previous observations on the HSP70 HSE. Certain selected sequences that exhibited preferential binding of mHSF1 or mHSF2 were mutagenized, and these studies demonstrated that the affinity of an HSE for a particular HSF and the extent of HSF interaction could be altered by single base substitutions. The domain of mHSF1 utilized for cooperative interactions was transferable, as chimeric mHSF1/mHSF2 proteins demonstrated that sequences within or adjacent to the mHSF1 DNA-binding domain were responsible. We have demonstrated that HSEs can have a greater affinity for a specific HSF and that in mice, mHSF1 utilizes a higher degree of cooperativity in DNA binding. This suggests two ways in which cells have developed to regulate the activity of closely related transcription factors: developing the ability to fully occupy the target binding site and alteration of the target site to favor interaction with a specific factor.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Jingxian Shen ◽  
Xiayu Fu ◽  
...  

Abstract Background The hepatitis B virus (HBV) is one of the main causes of viral hepatitis and liver cancer. HBV integration is one of the key steps in the virus-promoted malignant transformation. Results An attention-based deep learning model, DeepHBV, was developed to predict HBV integration sites. By learning local genomic features automatically, DeepHBV was trained and tested using HBV integration site data from the dsVIS database. Initially, DeepHBV showed an AUROC of 0.6363 and an AUPR of 0.5471 for the dataset. The integration of genomic features of repeat peaks and TCGA Pan-Cancer peaks significantly improved model performance, with AUROCs of 0.8378 and 0.9430 and AUPRs of 0.7535 and 0.9310, respectively. The transcription factor binding sites (TFBS) were significantly enriched near the genomic positions that were considered. The binding sites of the AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra, and Foxo3 were highlighted by DeepHBV in both the dsVIS and VISDB datasets, revealing a novel integration preference for HBV. Conclusions DeepHBV is a useful tool for predicting HBV integration sites, revealing novel insights into HBV integration-related carcinogenesis.


1994 ◽  
Vol 14 (11) ◽  
pp. 7592-7603 ◽  
Author(s):  
P E Kroeger ◽  
R I Morimoto

Multiple heat shock transcription factors (HSFs) have been discovered in several higher eukaryotes, raising questions about their respective functions in the cellular stress response. Previously, we had demonstrated that the two mouse HSFs (mHSF1 and mHSF2) interacted differently with the HSP70 heat shock element (HSE). To further address the issues of cooperativity and the interaction of multiple HSFs with the HSE, we selected new mHSF1 and mHSF2 DNA-binding sites through protein binding and PCR amplification. The selected sequences, isolated from a random population, were composed primarily of alternating inverted arrays of the pentameric consensus 5'-nGAAn-3', and the nucleotides flanking the core GAA motif were nonrandom. The average number of pentamers selected in each binding site was four to five for mHSF1 and two to three for mHSF2, suggesting differences in the potential for cooperative interactions between adjacent trimers. Our comparison of mHSF1 and mHSF2 binding to selected sequences further substantiated these differences in cooperativity as mHSF1, unlike mHSF2, was able to bind to extended HSE sequences, confirming previous observations on the HSP70 HSE. Certain selected sequences that exhibited preferential binding of mHSF1 or mHSF2 were mutagenized, and these studies demonstrated that the affinity of an HSE for a particular HSF and the extent of HSF interaction could be altered by single base substitutions. The domain of mHSF1 utilized for cooperative interactions was transferable, as chimeric mHSF1/mHSF2 proteins demonstrated that sequences within or adjacent to the mHSF1 DNA-binding domain were responsible. We have demonstrated that HSEs can have a greater affinity for a specific HSF and that in mice, mHSF1 utilizes a higher degree of cooperativity in DNA binding. This suggests two ways in which cells have developed to regulate the activity of closely related transcription factors: developing the ability to fully occupy the target binding site and alteration of the target site to favor interaction with a specific factor.


2020 ◽  
Vol 56 (98) ◽  
pp. 15454-15457
Author(s):  
Jan Zaucha ◽  
Charlotte A. Softley ◽  
Michael Sattler ◽  
Dmitrij Frishman ◽  
Grzegorz M. Popowicz

Deep learning model ‘hotWater’ scans the surface of proteins to identify the most likely water binding sites.


Sign in / Sign up

Export Citation Format

Share Document