scholarly journals iScore: a novel graph kernel-based function for scoring protein–protein docking models

2019 ◽  
Vol 36 (1) ◽  
pp. 112-121 ◽  
Author(s):  
Cunliang Geng ◽  
Yong Jung ◽  
Nicolas Renaud ◽  
Vasant Honavar ◽  
Alexandre M J J Bonvin ◽  
...  

Abstract Motivation Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. Results Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein–protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. Availability and implementation The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Cunliang Geng ◽  
Yong Jung ◽  
Nicolas Renaud ◽  
Vasant Honavar ◽  
Alexandre M.J.J. Bonvin ◽  
...  

ABSTRACTProtein complexes play a central role in many aspects of biological function. Knowledge of the three-dimensional (3D) structures of protein complexes is critical for gaining insights into the structural basis of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determination of 3D structures of protein complexes, computational docking has evolved as a valuable tool to predict the 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to that of the state-of-the-art scoring functions on independent data sets consisting docking software-specific data sets and the CAPRI score set built from a wide variety of docking approaches. iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary and topological, and physicochemical information for scoring docked conformations. This work represents the first successful demonstration of graph kernel to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. It paves the way for the further development of computational methods for predicting the structure of protein complexes.


2021 ◽  
Author(s):  
Yong Jung ◽  
Cunliang Geng ◽  
Alexandre M. J. J. Bonvin ◽  
Li C Xue ◽  
Vasant G Honavar

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking - the so-called scoring problem - still has considerable room for improvement. We present here MetaScore, a new machine-learning based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using a rich set of features extracted from the respective protein-protein interfaces. These include physico-chemical properties, energy terms, interaction propensity-based features, geometric properties, interface topology features, evolutionary conservation and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging of the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of nine traditional SFs included in this work in terms of success rate and hit rate evaluated over the top 10 predicted conformations; (ii) An ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by judiciously leveraging machine-learning.


2019 ◽  
Vol 36 (7) ◽  
pp. 2284-2285 ◽  
Author(s):  
Miguel Romero-Durana ◽  
Brian Jiménez-García ◽  
Juan Fernández-Recio

Abstract Motivation Protein–protein interactions are key to understand biological processes at the molecular level. As a complement to experimental characterization of protein interactions, computational docking methods have become useful tools for the structural and energetics modeling of protein–protein complexes. A key aspect of such algorithms is the use of scoring functions to evaluate the generated docking poses and try to identify the best models. When the scoring functions are based on energetic considerations, they can help not only to provide a reliable structural model for the complex, but also to describe energetic aspects of the interaction. This is the case of the scoring function used in pyDock, a combination of electrostatics, desolvation and van der Waals energy terms. Its correlation with experimental binding affinity values of protein–protein complexes was explored in the past, but the per-residue decomposition of the docking energy was never systematically analyzed. Results Here, we present pyDockEneRes (pyDock Energy per-Residue), a web server that provides pyDock docking energy partitioned at the residue level, giving a much more detailed description of the docking energy landscape. Additionally, pyDockEneRes computes the contribution to the docking energy of the side-chain atoms. This fast approach can be applied to characterize a complex structure in order to identify energetically relevant residues (hot-spots) and estimate binding affinity changes upon mutation to alanine. Availability and implementation The server does not require registration by the user and is freely accessible for academics at https://life.bsc.es/pid/pydockeneres. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Nicolas Renaud ◽  
Yong Jung ◽  
Vasant Honavar ◽  
Cunliang Geng ◽  
Alexandre M.J.J. Bonvin ◽  
...  

AbstractComputational docking is a promising tool to model three-dimensional (3D) structures of protein-protein complexes, which provides fundamental insights of protein functions in the cellular life. Singling out near-native models from the huge pool of generated docking models (referred to as the scoring problem) remains as a major challenge in computational docking. We recently published iScore, a novel graph kernel based scoring function. iScore ranks docking models based on their interface graph similarities to the training interface graph set. iScore uses a support vector machine approach with random-walk graph kernels to classify and rank protein-protein interfaces.Here, we present the software for iScore. The software provides executable scripts that fully automatize the computational workflow. In addition, the creation and analysis of the interface graph can be distributed across different processes using Message Passing interface (MPI) and can be offloaded to GPUs thanks to dedicated CUDA kernels.


2019 ◽  
Author(s):  
Sambit K. Mishra ◽  
Sarah J. Cooper ◽  
Jerry M. Parks ◽  
Julie C. Mitchell

AbstractProtein-protein interactions play a key role in mediating numerous biological functions, with more than half the proteins in living organisms existing as either homo- or hetero-oligomeric assemblies. Protein subunits that form oligomers minimize the free energy of the complex, but exhaustive computational search-based docking methods have not comprehensively addressed the protein docking challenge of distinguishing a natively bound complex from non-native forms. In this study, we propose a scoring function, KFC-E, that accounts for both conservation and coevolution of putative binding hotspot residues at protein-protein interfaces. For a benchmark set of 53 bound complexes, KFC-E identifies a near-native binding mode as the top-scoring pose in 38% and in the top 5 in 55% of the complexes. For a set of 17 unbound complexes, KFC-E identifies a near-native pose in the top 10 ranked poses in more than 50% of the cases. By contrast, a scoring function that incorporates information on coevolution at predicted non-hotspots performs poorly by comparison. Our study highlights the importance of coevolution at hotspot residues in forming natively bound complexes and suggests a novel approach for coevolutionary scoring in protein docking.Author SummaryA fundamental problem in biology is to distinguish between the native and non-native bound forms of protein-protein complexes. Experimental methods are often used to detect the native bound forms of proteins but, are demanding in terms of time and resources. Computational approaches have proven to be a useful alternative; they sample the different binding configurations for a pair of interacting proteins and then use an heuristic or physical model to score them. In this study we propose a new scoring approach, KFC-E, which focuses on the evolutionary contributions from a subset of key interface residues (hotspots) to identify native bound complexes. KFC-E capitalizes on the wealth of information in protein sequence databases by incorporating residue-level conservation and coevolution of putative binding hotspots. As hotspot residues mediate the binding energetics of protein-protein interactions, we hypothesize that the knowledge of putative hotspots coupled with their evolutionary information should be helpful in the identification of native bound protein-protein complexes.


2019 ◽  
Author(s):  
Georgy Derevyanko ◽  
Guillaume Lamoureux

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.


2019 ◽  
Vol 36 (1) ◽  
pp. 96-103 ◽  
Author(s):  
Jinfang Zheng ◽  
Xu Hong ◽  
Juan Xie ◽  
Xiaoxue Tong ◽  
Shiyong Liu

AbstractMotivationThe main function of protein–RNA interaction is to regulate the expression of genes. Therefore, studying protein–RNA interactions is of great significance. The information of three-dimensional (3D) structures reveals that atomic interactions are particularly important. The calculation method for modeling a 3D structure of a complex mainly includes two strategies: free docking and template-based docking. These two methods are complementary in protein–protein docking. Therefore, integrating these two methods may improve the prediction accuracy.ResultsIn this article, we compare the difference between the free docking and the template-based algorithm. Then we show the complementarity of these two methods. Based on the analysis of the calculation results, the transition point is confirmed and used to integrate two docking algorithms to develop P3DOCK. P3DOCK holds the advantages of both algorithms. The results of the three docking benchmarks show that P3DOCK is better than those two non-hybrid docking algorithms. The success rate of P3DOCK is also higher (3–20%) than state-of-the-art hybrid and non-hybrid methods. Finally, the hierarchical clustering algorithm is utilized to cluster the P3DOCK’s decoys. The clustering algorithm improves the success rate of P3DOCK. For ease of use, we provide a P3DOCK webserver, which can be accessed at www.rnabinding.com/P3DOCK/P3DOCK.html. An integrated protein–RNA docking benchmark can be downloaded from http://rnabinding.com/P3DOCK/benchmark.html.Availability and implementationwww.rnabinding.com/P3DOCK/P3DOCK.html.Supplementary informationSupplementary data are available at Bioinformatics online.


Author(s):  
Paweł Krupa ◽  
Agnieszka S Karczyńska ◽  
Magdalena A Mozolewska ◽  
Adam Liwo ◽  
Cezary Czaplewski

Abstract Motivation The majority of the proteins in living organisms occur as homo- or hetero-multimeric structures. Although there are many tools to predict the structures of single-chain proteins or protein complexes with small ligands, peptide–protein and protein–protein docking is more challenging. In this work, we utilized multiplexed replica-exchange molecular dynamics (MREMD) simulations with the physics-based heavily coarse-grained UNRES model, which provides more than a 1000-fold simulation speed-up compared with all-atom approaches to predict structures of protein complexes. Results We present a new protein–protein and peptide–protein docking functionality of the UNRES package, which includes a variable degree of conformational flexibility. UNRES-Dock protocol was tested on a set of 55 complexes with size from 43 to 587 amino-acid residues, showing that structures of the complexes can be predicted with good quality, if the sampling of the conformational space is sufficient, especially for flexible peptide–protein systems. The developed automatized protocol has been implemented in the standalone UNRES package and in the UNRES server. Availability and implementation UNRES server: http://unres-server.chem.ug.edu.pl; UNRES package and data used in testing of UNRES-Dock: http://unres.pl. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document