Electrostatic Potential Energy in Protein-Drug Complexes

2021 ◽  
Vol 28 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Walter Filgueira de Azevedo Junior

Background: Electrostatic interactions are one of the forces guiding the binding of molecules to proteins. The assessment of this interaction through computational approaches makes it possible to evaluate the energy of protein-drug complexes. Objective: Our purpose here is to review some the of methods used to calculate the electrostatic energy of protein-drug complexes and explore the capacity of these approaches for the generation of new computational tools for drug discovery using the abstraction of scoring function space. Method: Here we present an overview of AutoDock4 semi-empirical scoring function used to calculate binding affinity for protein-drug complexes. We focus our attention on electrostatic interactions and how to explore recently published results to increase the predictive performance of the computational models to estimate the energetics of protein-drug interactions. Public data available at Binding MOAD, BindingDB, and PDBbind were used to review the predictive performance of different approaches to predict binding affinity. Results: A comprehensive outline of the scoring function used to evaluate potential energy available in docking programs is presented. Recent developments of computational models to predict protein-drug energetics were able to create targeted-scoring functions to predict binding to these proteins. These targeted models outperform classical scoring functions and highlight the importance of electrostatic interactions in the definition of the binding. Conclusion: Here, we reviewed the development of scoring functions to predict binding affinity through the application of a semi-empirical free energy scoring function. Our studies show the superior predictive performance of machine learning models when compared with classical scoring functions and the importance of electrostatic interactions for binding affinity.

2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


2011 ◽  
Vol 09 (supp01) ◽  
pp. 1-14 ◽  
Author(s):  
XUCHANG OUYANG ◽  
STEPHANUS DANIEL HANDOKO ◽  
CHEE KEONG KWOH

Protein–ligand docking is a computational method to identify the binding mode of a ligand and a target protein, and predict the corresponding binding affinity using a scoring function. This method has great value in drug design. After decades of development, scoring functions nowadays typically can identify the true binding mode, but the prediction of binding affinity still remains a major problem. Here we present CScore, a data-driven scoring function using a modified Cerebellar Model Articulation Controller (CMAC) learning architecture, for accurate binding affinity prediction. The performance of CScore in terms of correlation between predicted and experimental binding affinities is benchmarked under different validation approaches. CScore achieves a prediction with R = 0.7668 and RMSE = 1.4540 when tested on an independent dataset. To the best of our knowledge, this result outperforms other scoring functions tested on the same dataset. The performance of CScore varies on different clusters under the leave-cluster-out validation approach, but still achieves competitive result. Lastly, the target-specified CScore achieves an even better result with R = 0.8237 and RMSE = 1.0872, trained on a much smaller but more relevant dataset for each target. The large dataset of protein–ligand complexes structural information and advances of machine learning techniques enable the data-driven approach in binding affinity prediction. CScore is capable of accurate binding affinity prediction. It is also shown that CScore will perform better if sufficient and relevant data is presented. As there is growth of publicly available structural data, further improvement of this scoring scheme can be expected.


2021 ◽  
Author(s):  
Prashant Kumar ◽  
Paulina Dominiak

<div> <div> <div> <p>Computational analysis of protein-ligand interactions is of crucial importance for drug discovery. Assessment of ligand binding energy allows us to have a glimpse on the potential of a small organic molecule to be a ligand to the binding site of a protein target. Available scoring functions such as in docking programs, we could say that they all rely on equations that sum each type of protein-ligand interactions to model the binding affinity. Most of the scoring functions consider electrostatic interactions involving the protein and the ligand. Electrostatic interactions contribute one of the most important part of total interaction energies between macromolecules, unlike dispersion forces they are highly directional and therefore dominate the nature of molecular packing in crystals and in biological complexes and contribute significantly to differences in inhibition strength among related enzyme inhibitors. In this paper, complexes of HIV-1 protease with inhibitor molecules (JE-2147 and Darunavir) have been analysed using charge densities from a transferable aspherical-atom data bank. Moreover, we analyse the electrostatic interaction energy for an ensemble of structures using molecular dynamic simulation to highlight the main features related to the importance of this interaction for binding affinity. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Prashant Kumar ◽  
Paulina Dominiak

<div> <div> <div> <p>Computational analysis of protein-ligand interactions is of crucial importance for drug discovery. Assessment of ligand binding energy allows us to have a glimpse on the potential of a small organic molecule to be a ligand to the binding site of a protein target. Available scoring functions such as in docking programs, we could say that they all rely on equations that sum each type of protein-ligand interactions to model the binding affinity. Most of the scoring functions consider electrostatic interactions involving the protein and the ligand. Electrostatic interactions contribute one of the most important part of total interaction energies between macromolecules, unlike dispersion forces they are highly directional and therefore dominate the nature of molecular packing in crystals and in biological complexes and contribute significantly to differences in inhibition strength among related enzyme inhibitors. In this paper, complexes of HIV-1 protease with inhibitor molecules (JE-2147 and Darunavir) have been analysed using charge densities from a transferable aspherical-atom data bank. Moreover, we analyse the electrostatic interaction energy for an ensemble of structures using molecular dynamic simulation to highlight the main features related to the importance of this interaction for binding affinity. </p> </div> </div> </div>


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7362 ◽  
Author(s):  
Haiping Zhang ◽  
Linbu Liao ◽  
Konda Mani Saravanan ◽  
Peng Yin ◽  
Yanjie Wei

Proteins interact with small molecules to modulate several important cellular functions. Many acute diseases were cured by small molecule binding in the active site of protein either by inhibition or activation. Currently, there are several docking programs to estimate the binding position and the binding orientation of protein–ligand complex. Many scoring functions were developed to estimate the binding strength and predict the effective protein–ligand binding. While the accuracy of current scoring function is limited by several aspects, the solvent effect, entropy effect, and multibody effect are largely ignored in traditional machine learning methods. In this paper, we proposed a new deep neural network-based model named DeepBindRG to predict the binding affinity of protein–ligand complex, which learns all the effects, binding mode, and specificity implicitly by learning protein–ligand interface contact information from a large protein–ligand dataset. During the initial data processing step, the critical interface information was preserved to make sure the input is suitable for the proposed deep learning model. While validating our model on three independent datasets, DeepBindRG achieves root mean squared error (RMSE) value of pKa (−logKd or −logKi) about 1.6–1.8 and R value around 0.5–0.6, which is better than the autodock vina whose RMSE value is about 2.2–2.4 and R value is 0.42–0.57. We also explored the detailed reasons for the performance of DeepBindRG, especially for several failed cases by vina. Furthermore, DeepBindRG performed better for four challenging datasets from DUD.E database with no experimental protein–ligand complexes. The better performance of DeepBindRG than autodock vina in predicting protein–ligand binding affinity indicates that deep learning approach can greatly help with the drug discovery process. We also compare the performance of DeepBindRG with a 4D based deep learning method “pafnucy”, the advantage and limitation of both methods have provided clues for improving the deep learning based protein–ligand prediction model in the future.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


2021 ◽  
Vol 28 ◽  
Author(s):  
Martina Veit-Acosta ◽  
Walter Filgueira de Azevedo Junior

Background: CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. Objective: This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. Method: We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. Results: Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. Conclusion: All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.


Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett M Morris

Abstract Motivation Machine learning scoring functions for protein–ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein–ligand complex, with limited information about the chemical or topological properties of the ligand itself. Results We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest (RF) combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.836, 0.780 and 0.821 on the PDBbind 2007, 2013 and 2016 core sets, respectively, compared to 0.790, 0.746 and 0.814 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a RF using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets. Availability and implementation Data and code to reproduce all the results are freely available at http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document