scholarly journals MetaScore: A novel machine-learning based approach to improve traditional scoring functions for scoring protein-protein docking conformations

2021 ◽  
Author(s):  
Yong Jung ◽  
Cunliang Geng ◽  
Alexandre M. J. J. Bonvin ◽  
Li C Xue ◽  
Vasant G Honavar

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking - the so-called scoring problem - still has considerable room for improvement. We present here MetaScore, a new machine-learning based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using a rich set of features extracted from the respective protein-protein interfaces. These include physico-chemical properties, energy terms, interaction propensity-based features, geometric properties, interface topology features, evolutionary conservation and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging of the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of nine traditional SFs included in this work in terms of success rate and hit rate evaluated over the top 10 predicted conformations; (ii) An ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by judiciously leveraging machine-learning.

2018 ◽  
Author(s):  
Cunliang Geng ◽  
Yong Jung ◽  
Nicolas Renaud ◽  
Vasant Honavar ◽  
Alexandre M.J.J. Bonvin ◽  
...  

ABSTRACTProtein complexes play a central role in many aspects of biological function. Knowledge of the three-dimensional (3D) structures of protein complexes is critical for gaining insights into the structural basis of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determination of 3D structures of protein complexes, computational docking has evolved as a valuable tool to predict the 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to that of the state-of-the-art scoring functions on independent data sets consisting docking software-specific data sets and the CAPRI score set built from a wide variety of docking approaches. iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary and topological, and physicochemical information for scoring docked conformations. This work represents the first successful demonstration of graph kernel to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. It paves the way for the further development of computational methods for predicting the structure of protein complexes.


2019 ◽  
Vol 36 (1) ◽  
pp. 112-121 ◽  
Author(s):  
Cunliang Geng ◽  
Yong Jung ◽  
Nicolas Renaud ◽  
Vasant Honavar ◽  
Alexandre M J J Bonvin ◽  
...  

Abstract Motivation Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. Results Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein–protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. Availability and implementation The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 1 ◽  
Author(s):  
Ye Han ◽  
Fei He ◽  
Yongbing Chen ◽  
Wenyuan Qin ◽  
Helong Yu ◽  
...  

Protein docking provides a structural basis for the design of drugs and vaccines. Among the processes of protein docking, quality assessment (QA) is utilized to pick near-native models from numerous protein docking candidate conformations, and it directly determines the final docking results. Although extensive efforts have been made to improve QA accuracy, it is still the bottleneck of current protein docking systems. In this paper, we presented a Deep Graph Attention Neural Network (DGANN) to evaluate and rank protein docking candidate models. DGANN learns inter-residue physio-chemical properties and structural fitness across the two protein monomers in a docking model and generates their probabilities of near-native models. On the ZDOCK decoy benchmark, our DGANN outperformed the ranking provided by ZDOCK in terms of ranking good models into the top selections. Furthermore, we conducted comparative experiments on an independent testing dataset, and the results also demonstrated the superiority and generalization of our proposed method.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Isabella A. Guedes ◽  
André M. S. Barreto ◽  
Diogo Marinho ◽  
Eduardo Krempser ◽  
Mélaine A. Kuenemann ◽  
...  

AbstractScoring functions are essential for modern in silico drug discovery. However, the accurate prediction of binding affinity by scoring functions remains a challenging task. The performance of scoring functions is very heterogeneous across different target classes. Scoring functions based on precise physics-based descriptors better representing protein–ligand recognition process are strongly needed. We developed a set of new empirical scoring functions, named DockTScore, by explicitly accounting for physics-based terms combined with machine learning. Target-specific scoring functions were developed for two important drug targets, proteases and protein–protein interactions, representing an original class of molecules for drug discovery. Multiple linear regression (MLR), support vector machine and random forest algorithms were employed to derive general and target-specific scoring functions involving optimized MMFF94S force-field terms, solvation and lipophilic interactions terms, and an improved term accounting for ligand torsional entropy contribution to ligand binding. DockTScore scoring functions demonstrated to be competitive with the current best-evaluated scoring functions in terms of binding energy prediction and ranking on four DUD-E datasets and will be useful for in silico drug design for diverse proteins as well as for specific targets such as proteases and protein–protein interactions. Currently, the MLR DockTScore is available at www.dockthor.lncc.br.


2019 ◽  
Author(s):  
Abhilesh S. Dhawanjewar ◽  
Ankit Roy ◽  
M.S. Madhusudhan

AbstractMotivationElucidation of protein-protein interactions is a necessary step towards understanding the complete repertoire of cellular biochemistry. Given the enormity of the problem, the expenses and limitations of experimental methods, it is imperative that this problem is tackled computationally. In silico predictions of protein interactions entail sampling different conformations of the purported complex and then scoring these to assess for interaction viability. In this study we have devised a new scheme for scoring protein-protein interactions.ResultsOur method, PIZSA (Protein Interaction Z Score Assessment) is a binary classification scheme for identification of stable protein quaternary assemblies (binders/non-binders) based on statistical potentials. The scoring scheme incorporates residue-residue contact preference on the interface with per residue-pair atomic contributions and accounts for clashes. PIZSA can accurately discriminate between native and non-native structural conformations from protein docking experiments and outperform other recently published scoring functions, demonstrated through testing on a benchmark set and the CAPRI Score_set. Though not explicitly trained for this purpose, PIZSA potentials can identify spurious interactions that are artefacts of the crystallization process.AvailabilityPIZSA is implemented as awebserverat http://cospi.iiserpune.ac.in/pizsa/[email protected]


Author(s):  
Chloé Quignot ◽  
Pierre Granger ◽  
Pablo Chacón ◽  
Raphael Guerois ◽  
Jessica Andreani

Abstract Motivation The crucial role of protein interactions and the difficulty in characterising them experimentally strongly motivates the development of computational approaches for structural prediction. Even when protein-protein docking samples correct models, current scoring functions struggle to discriminate them from incorrect decoys. The previous incorporation of conservation and coevolution information has shown promise for improving protein-protein scoring. Here, we present a novel strategy to integrate atomic-level evolutionary information into different types of scoring functions to improve their docking discrimination. Results : We applied this general strategy to our residue-level statistical potential from InterEvScore and to two atomic-level scores, SOAP-PP and Rosetta interface score (ISC). Including evolutionary information from as few as ten homologous sequences improves the top 10 success rates of individual atomic-level scores SOAP-PP and Rosetta ISC by respectively 6 and 13.5 percentage points, on a large benchmark of 752 docking cases. The best individual homology-enriched score reaches a top 10 success rate of 34.4%. A consensus approach based on the complementarity between different homology-enriched scores further increases the top 10 success rate to 40%. Availability All data used for benchmarking and scoring results, as well as a Singularity container of the pipeline, are available at http://biodev.cea.fr/interevol/interevdata/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Varsha D Badal ◽  
Petras J Kundrotas ◽  
Ilya A Vakser

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2284-2285 ◽  
Author(s):  
Miguel Romero-Durana ◽  
Brian Jiménez-García ◽  
Juan Fernández-Recio

Abstract Motivation Protein–protein interactions are key to understand biological processes at the molecular level. As a complement to experimental characterization of protein interactions, computational docking methods have become useful tools for the structural and energetics modeling of protein–protein complexes. A key aspect of such algorithms is the use of scoring functions to evaluate the generated docking poses and try to identify the best models. When the scoring functions are based on energetic considerations, they can help not only to provide a reliable structural model for the complex, but also to describe energetic aspects of the interaction. This is the case of the scoring function used in pyDock, a combination of electrostatics, desolvation and van der Waals energy terms. Its correlation with experimental binding affinity values of protein–protein complexes was explored in the past, but the per-residue decomposition of the docking energy was never systematically analyzed. Results Here, we present pyDockEneRes (pyDock Energy per-Residue), a web server that provides pyDock docking energy partitioned at the residue level, giving a much more detailed description of the docking energy landscape. Additionally, pyDockEneRes computes the contribution to the docking energy of the side-chain atoms. This fast approach can be applied to characterize a complex structure in order to identify energetically relevant residues (hot-spots) and estimate binding affinity changes upon mutation to alanine. Availability and implementation The server does not require registration by the user and is freely accessible for academics at https://life.bsc.es/pid/pydockeneres. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Chloé Quignot ◽  
Pierre Granger ◽  
Pablo Chacón ◽  
Raphael Guerois ◽  
Jessica Andreani

AbstractThe crucial role of protein interactions and the difficulty in characterising them experimentally strongly motivates the development of computational approaches for structural prediction. Even when protein-protein docking samples correct models, current scoring functions struggle to discriminate them from incorrect decoys. The previous incorporation of conservation and coevolution information has shown promise for improving protein-protein scoring. Here, we present a novel strategy to integrate atomic-level evolutionary information into different types of scoring functions to improve their docking discrimination.We applied this general strategy to our residue-level statistical potential from InterEvScore and to two atomic-level scores, SOAP-PP and Rosetta interface score (ISC). Including evolutionary information from as few as ten homologous sequences improves the top 10 success rates of these individual scores by respectively 6.5, 6 and 13.5 percentage points, on a large benchmark of 752 docking cases. The best individual homology-enriched score reaches a top 10 success rate of 34.4%. A consensus approach based on the complementarity between different homology-enriched scores further increases the top 10 success rate to 40%.All data used for benchmarking and scoring results, as well as pipelining scripts, are available at http://biodev.cea.fr/interevol/interevdata/


2021 ◽  
Author(s):  
Didier Barradas-Bautista ◽  
Zhen Cao ◽  
Anna Vangone ◽  
Romina Oliva ◽  
Luigi Cavallo

Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated a set of ~7xE06 docking models with three different docking programs (HADDOCK, FTDock and ZDOCK) for the 230 complexes in the protein-protein interaction benchmark, version 5 (BM5). Three different machine-learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named CoDES (COnservation Driven Expert System). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine-learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions.


Sign in / Sign up

Export Citation Format

Share Document