CRDS: Consensus Reverse Docking System for target fishing

Author(s):  
Aeri Lee ◽  
Dongsup Kim

Abstract Motivation Identification of putative drug targets is a critical step for explaining the mechanism of drug action against multiple targets, finding new therapeutic indications for existing drugs and unveiling the adverse drug reactions. One important approach is to use the molecular docking. However, its widespread utilization has been hindered by the lack of easy-to-use public servers. Therefore, it is vital to develop a streamlined computational tool for target prediction by molecular docking on a large scale. Results We present a fully automated web tool named Consensus Reverse Docking System (CRDS), which predicts potential interaction sites for a given drug. To improve hit rates, we developed a strategy of consensus scoring. CRDS carries out reverse docking against 5254 candidate protein structures using three different scoring functions (GoldScore, Vina and LeDock from GOLD version 5.7.1, AutoDock Vina version 1.1.2 and LeDock version 1.0, respectively), and those scores are combined into a single score named Consensus Docking Score (CDS). The web server provides the list of top 50 predicted interaction sites, docking conformations, 10 most significant pathways and the distribution of consensus scores. Availability and implementation The web server is available at http://pbil.kaist.ac.kr/CRDS. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Yiwei Li ◽  
G Brian Golding ◽  
Lucian Ilie

Abstract Motivation Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein–protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. Results We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI’s predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. Availability and implementation The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Tanvir Hossain ◽  
Mohammad Kamruzzaman ◽  
Talita Zahin Choudhury ◽  
Hamida Nooreen Mahmood ◽  
A. H. M. Nurun Nabi ◽  
...  

The emergence of novel pathogenic strains with increased antibacterial resistance patterns poses a significant threat to the management of infectious diseases. In this study, we aimed at utilizing the subtractive genomic approach to identify novel drug targets against Salmonella enterica subsp. enterica serovar Poona strain ATCC BAA-1673. We employed in silico bioinformatics tools to subtract the strain-specific paralogous and host-specific homologous sequences from the bacterial proteome. The sorted proteome was further refined to identify the essential genes in the pathogenic bacterium using the database of essential genes (DEG). We carried out metabolic pathway and subcellular location analysis of the essential proteins of the pathogen to elucidate the involvement of these proteins in important cellular processes. We found 52 unique essential proteins in the target proteome that could be utilized as novel targets to design newer drugs. Further, we investigated these proteins in the DrugBank databases and 11 of the unique essential proteins showed druggability according to the FDA approved drug bank databases with diverse broad-spectrum property. Molecular docking analyses of the novel druggable targets with the drugs were carried out by AutoDock Vina option based on scoring functions. The results showed promising candidates for novel drugs against Salmonella infections.


2020 ◽  
Author(s):  
Baldomero Imbernón ◽  
Antonio Serrano ◽  
Andrés Bueno-Crespo ◽  
José L Abellán ◽  
Horacio Pérez-Sánchez ◽  
...  

Abstract Motivation Molecular docking methods are extensively used to predict the interaction between protein–ligand systems in terms of structure and binding affinity, through the optimization of a physics-based scoring function. However, the computational requirements of these simulations grow exponentially with: (i) the global optimization procedure, (ii) the number and degrees of freedom of molecular conformations generated and (iii) the mathematical complexity of the scoring function. Results In this work, we introduce a novel molecular docking method named METADOCK 2, which incorporates several novel features, such as (i) a ligand-dependent blind docking approach that exhaustively scans the whole protein surface to detect novel allosteric sites, (ii) an optimization method to enable the use of a wide branch of metaheuristics and (iii) a heterogeneous implementation based on multicore CPUs and multiple graphics processing units. Two representative scoring functions implemented in METADOCK 2 are extensively evaluated in terms of computational performance and accuracy using several benchmarks (such as the well-known DUD) against AutoDock 4.2 and AutoDock Vina. Results place METADOCK 2 as an efficient and accurate docking methodology able to deal with complex systems where computational demands are staggering and which outperforms both AutoDock Vina and AutoDock 4. Availability and implementation https://[email protected]/Baldoimbernon/metadock_2.git. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4490-4497
Author(s):  
Siqi Liang ◽  
Haiyuan Yu

Abstract Motivation In silico drug target prediction provides valuable information for drug repurposing, understanding of side effects as well as expansion of the druggable genome. In particular, discovery of actionable drug targets is critical to developing targeted therapies for diseases. Results Here, we develop a robust method for drug target prediction by leveraging a class imbalance-tolerant machine learning framework with a novel training scheme. We incorporate novel features, including drug–gene phenotype similarity and gene expression profile similarity that capture information orthogonal to other features. We show that our classifier achieves robust performance and is able to predict gene targets for new drugs as well as drugs that potentially target unexplored genes. By providing newly predicted drug–target associations, we uncover novel opportunities of drug repurposing that may benefit cancer treatment through action on either known drug targets or currently undrugged genes. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rudolf A. Römer ◽  
Navodya S. Römer ◽  
A. Katrine Wallis

AbstractThe worldwide CoVid-19 pandemic has led to an unprecedented push across the whole of the scientific community to develop a potent antiviral drug and vaccine as soon as possible. Existing academic, governmental and industrial institutions and companies have engaged in large-scale screening of existing drugs, in vitro, in vivo and in silico. Here, we are using in silico modelling of possible SARS-CoV-2 drug targets, as deposited on the Protein Databank (PDB), and ascertain their dynamics, flexibility and rigidity. For example, for the SARS-CoV-2 spike protein—using its complete homo-trimer configuration with 2905 residues—our method identifies a large-scale opening and closing of the S1 subunit through movement of the S$${}^\text{B}$$ B domain. We compute the full structural information of this process, allowing for docking studies with possible drug structures. In a dedicated database, we present similarly detailed results for the further, nearly 300, thus far resolved SARS-CoV-2-related protein structures in the PDB.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joaquim Aguirre-Plans ◽  
Alberto Meseguer ◽  
Ruben Molina-Fernandez ◽  
Manuel Alejandro Marín-López ◽  
Gaurav Jumde ◽  
...  

Abstract Background Statistical potentials, also named knowledge-based potentials, are scoring functions derived from empirical data that can be used to evaluate the quality of protein folds and protein–protein interaction (PPI) structures. In previous works we decomposed the statistical potentials in different terms, named Split-Statistical Potentials, accounting for the type of amino acid pairs, their hydrophobicity, solvent accessibility and type of secondary structure. These potentials have been successfully used to identify near-native structures in protein structure prediction, rank protein docking poses, and predict PPI binding affinities. Results Here, we present the SPServer, a web server that applies the Split-Statistical Potentials to analyze protein folds and protein interfaces. SPServer provides global scores as well as residue/residue-pair profiles presented as score plots and maps. This level of detail allows users to: (1) identify potentially problematic regions on protein structures; (2) identify disrupting amino acid pairs in protein interfaces; and (3) compare and analyze the quality of tertiary and quaternary structural models. Conclusions While there are many web servers that provide scoring functions to assess the quality of either protein folds or PPI structures, SPServer integrates both aspects in a unique easy-to-use web server. Moreover, the server permits to locally assess the quality of the structures and interfaces at a residue level and provides tools to compare the local assessment between structures. Server address https://sbi.upf.edu/spserver/.


Author(s):  
Yusuke Matsui ◽  
Yuichi Abe ◽  
Kohei Uno ◽  
Satoru Miyano

Abstract Motivation The full spectrum of abnormalities in cancer-associated protein complexes remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and healthy cells may provide insights regarding cancer-specific protein dysfunction. However, the technical limitations of mass spectrometry-based proteomics, including contamination with biological protein variants, causes noise that leads to non-negligible over- (or under-) estimating co-expression. Results We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving identification accuracy with noisy data compared to conventional linear correlation-based approaches. As an application, we use large-scale proteomic data from renal cancer to show that important protein complexes, regulatory signaling pathways and drug targets can be identified. The proposed approach surpasses traditional linear correlations to provide insights into higher-order differential co-expression structures. Availability and implementation https://github.com/ymatts/RoDiCE. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Blaž Škrlj ◽  
Nika Eržen ◽  
Nada Lavrač ◽  
Tanja Kunej ◽  
Janez Konc

Abstract Motivation Causal biological interaction networks represent cellular regulatory pathways. Their fusion with other biological data enables insights into disease mechanisms and novel opportunities for drug discovery. Results We developed Causal Network of Diseases (CaNDis), a web server for the exploration of a human causal interaction network, which we expanded with data on diseases and FDA-approved drugs, on the basis of which we constructed a disease–disease network in which the links represent the similarity between diseases. We show how CaNDis can be used to identify candidate genes with known and novel roles in disease co-occurrence and drug–drug interactions. Availabilityand implementation CaNDis is freely available to academic users at http://candis.ijs.si and http://candis.insilab.org. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (20) ◽  
pp. 3989-3995 ◽  
Author(s):  
Hongjian Li ◽  
Jiangjun Peng ◽  
Pavel Sidorov ◽  
Yee Leung ◽  
Kwong-Sak Leung ◽  
...  

Abstract Motivation Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. Results We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. Availability and implementation https://github.com/HongjianLi/MLSF Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (13) ◽  
pp. 3174
Author(s):  
Alejandro Valdés-Jiménez ◽  
Josep-L. Larriba-Pey ◽  
Gabriel Núñez-Vivanco ◽  
Miguel Reyes-Parada

Discovering conserved three-dimensional (3D) patterns among protein structures may provide valuable insights into protein classification, functional annotations or the rational design of multi-target drugs. Thus, several computational tools have been developed to discover and compare protein 3D-patterns. However, most of them only consider previously known 3D-patterns such as orthosteric binding sites or structural motifs. This fact makes necessary the development of new methods for the identification of all possible 3D-patterns that exist in protein structures (allosteric sites, enzyme-cofactor interaction motifs, among others). In this work, we present 3D-PP, a new free access web server for the discovery and recognition all similar 3D amino acid patterns among a set of proteins structures (independent of their sequence similarity). This new tool does not require any previous structural knowledge about ligands, and all data are organized in a high-performance graph database. The input can be a text file with the PDB access codes or a zip file of PDB coordinates regardless of the origin of the structural data: X-ray crystallographic experiments or in silico homology modeling. The results are presented as lists of sequence patterns that can be further analyzed within the web page. We tested the accuracy and suitability of 3D-PP using two sets of proteins coming from the Protein Data Bank: (a) Zinc finger containing and (b) Serotonin target proteins. We also evaluated its usefulness for the discovering of new 3D-patterns, using a set of protein structures coming from in silico homology modeling methodologies, all of which are overexpressed in different types of cancer. Results indicate that 3D-PP is a reliable, flexible and friendly-user tool to identify conserved structural motifs, which could be relevant to improve the knowledge about protein function or classification. The web server can be freely utilized at https://appsbio.utalca.cl/3d-pp/.


Sign in / Sign up

Export Citation Format

Share Document