Development of homology model, docking protocol and Machine-Learning based scoring functions for identification of Equus caballus’s butyrylcholinesterase inhibitors

Author(s):  
Ankit Ganeshpurkar ◽  
Ravi Singh ◽  
Devendra Kumar ◽  
Gopichand Gutti ◽  
Divya Sardana ◽  
...  
Author(s):  
Jun Pei ◽  
Zheng Zheng ◽  
Hyunji Kim ◽  
Lin Song ◽  
Sarah Walworth ◽  
...  

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>


2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Maciej Wójcikowski ◽  
Pedro J. Ballester ◽  
Pawel Siedlecki

Glycobiology ◽  
2018 ◽  
Vol 29 (2) ◽  
pp. 124-136 ◽  
Author(s):  
Juan I Blanco Capurro ◽  
Matias Di Paola ◽  
Marcelo Daniel Gamarra ◽  
Marcelo A Martí ◽  
Carlos P Modenutti

Abstract Unraveling the structure of lectin–carbohydrate complexes is vital for understanding key biological recognition processes and development of glycomimetic drugs. Molecular Docking application to predict them is challenging due to their low affinity, hydrophilic nature and ligand conformational diversity. In the last decade several strategies, such as the inclusion of glycan conformation specific scoring functions or our developed solvent-site biased method, have improved carbohydrate docking performance but significant challenges remain, in particular, those related to receptor conformational diversity. In the present work we have analyzed conventional and solvent-site biased autodock4 performance concerning receptor conformational diversity as derived from different crystal structures (apo and holo), Molecular Dynamics snapshots and Homology-based models, for 14 different lectin–monosaccharide complexes. Our results show that both conventional and biased docking yield accurate lectin–monosaccharide complexes, starting from either apo or homology-based structures, even when only moderate (45%) sequence identity templates are available. An essential element for success is a proper combination of a middle-sized (10–100 structures) conformational ensemble, derived either from Molecular dynamics or multiple homology model building. Consistent with our previous works, results show that solvent-site biased methods improve overall performance, but that results are still highly system dependent. Finally, our results also show that docking can select the correct receptor structure within the ensemble, underscoring the relevance of joint evaluation of both ligand pose and receptor conformation.


2021 ◽  
Vol 1 (1) ◽  
pp. 40-47
Author(s):  
Emilio Viktorov Mateev ◽  
Iva Valkova ◽  
Maya Georgieva ◽  
Alexander Zlatkov

Recently, the application of molecular docking is drastically increasing due to the rapid growth of resolved crystallographic receptors with co-crystallized ligands. However, the inability of docking softwares to correctly score the occurred interactions between ligands and receptors is still a relevant issue. This study examined the Pearson’s correlation coefficient between the experimental monoamine oxidase-B (MAO-B) inhibitory activity of 44 novel coumarins and the obtained GOLD 5.3 docking scores. Subsequently, optimization of the docking protocol was carried out to achieve the best possible pairwise correlation. Numerous modifications in the docking settings such as alteration in the scoring functions, size of the grid space, presence of active waters, and side-chain flexibility were conducted. Furthermore, ensemble docking simulations into two superimposed complexes were performed. The model was validated with a test set. A significant Pearson’s correlation coefficient of 0.8217 was obtained for the latter. In the final stage of our work, we observed the major interactions between the top-scored ligands and the active site of 1S3B.


Author(s):  
Stefan Holderbach ◽  
Lukas Adam ◽  
Bhyravabhotla Jayaram ◽  
Rebecca Wade ◽  
Goutam Mukherjee

The virtual screening of large numbers of compounds against target protein binding sites has become an integral component of drug discovery workflows. This screening is often done by computationally docking ligands into a protein binding site of interest, but this has the drawback that a large number of poses must be evaluated to obtain accurate estimates of protein-ligand binding affinity. We here introduce a fast prefiltering method for ligand prioritization that is based on a set of machine learning models and uses simple pose-invariant physicochemical descriptors of the ligands and the protein binding pocket. Our method, Rapid Screening with Physicochemical Descriptors + machine learning (RASPD+), is trained on PDBbind data and achieves a regression performance better than for the original RASPD method and comparable to traditional scoring functions on a range of different test sets without the need for generating ligand poses. Additionally, we use RASPD+ to identify molecular features important for binding affinity and assess the ability of RASPD+ to enrich active molecules from decoys.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


Sign in / Sign up

Export Citation Format

Share Document