Using diverse potentials and scoring functions for the development of improved machine-learned models for protein–ligand affinity and docking pose prediction

Author(s):  
Omar N. A. Demerdash
2020 ◽  
Vol 21 (15) ◽  
pp. 5183 ◽  
Author(s):  
Eric D. Boittier ◽  
Yat Yin Tang ◽  
McKenna E. Buckley ◽  
Zachariah P. Schuurs ◽  
Derek J. Richard ◽  
...  

A promising protein target for computational drug development, the human cluster of differentiation 38 (CD38), plays a crucial role in many physiological and pathological processes, primarily through the upstream regulation of factors that control cytoplasmic Ca2+ concentrations. Recently, a small-molecule inhibitor of CD38 was shown to slow down pathways relating to aging and DNA damage. We examined the performance of seven docking programs for their ability to model protein-ligand interactions with CD38. A test set of twelve CD38 crystal structures, containing crystallized biologically relevant substrates, were used to assess pose prediction. The rankings for each program based on the median RMSD between the native and predicted were Vina, AD4 > PLANTS, Gold, Glide, Molegro > rDock. Forty-two compounds with known affinities were docked to assess the accuracy of the programs at affinity/ranking predictions. The rankings based on scoring power were: Vina, PLANTS > Glide, Gold > Molegro >> AutoDock 4 >> rDock. Out of the top four performing programs, Glide had the only scoring function that did not appear to show bias towards overpredicting the affinity of the ligand-based on its size. Factors that affect the reliability of pose prediction and scoring are discussed. General limitations and known biases of scoring functions are examined, aided in part by using molecular fingerprints and Random Forest classifiers. This machine learning approach may be used to systematically diagnose molecular features that are correlated with poor scoring accuracy.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Chao Shen ◽  
Xueping Hu ◽  
Junbo Gao ◽  
Xujun Zhang ◽  
Haiyang Zhong ◽  
...  

AbstractStructure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein–ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936, respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein–ligand binding poses.


Author(s):  
Maria Kadukova ◽  
Karina dos Santos Machado ◽  
Pablo Chacón ◽  
Sergei Grudinin

Abstract Motivation Despite the progress made in studying protein–ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations. Results Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction. Availabilityand implementation The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl. Supplementary information Supplementary data are available at Bioinformatics online.


BMC Chemistry ◽  
2020 ◽  
Vol 14 (1) ◽  
Author(s):  
Shuai Wang ◽  
Jun-Hao Jiang ◽  
Ruo-Yu Li ◽  
Ping Deng

2020 ◽  
Vol 16 (3) ◽  
pp. 182-190 ◽  
Author(s):  
Giulio Poli ◽  
Tiziano Tuccinardi

Background: Molecular docking is probably the most popular and profitable approach in computer-aided drug design, being the staple technique for predicting the binding mode of bioactive compounds and for performing receptor-based virtual screening studies. The growing attention received by docking, as well as the need for improving its reliability in pose prediction and virtual screening performance, has led to the development of a wide plethora of new docking algorithms and scoring functions. Nevertheless, it is unlikely to identify a single procedure outperforming the other ones in terms of reliability and accuracy or demonstrating to be generally suitable for all kinds of protein targets. Methods: In this context, consensus docking approaches are taking hold in computer-aided drug design. These computational protocols consist in docking ligands using multiple docking methods and then comparing the binding poses predicted for the same ligand by the different methods. This analysis is usually carried out calculating the root-mean-square deviation among the different docking results obtained for each ligand, in order to identify the number of docking methods producing the same binding pose. Results: The consensus docking approaches demonstrated to improve the quality of docking and virtual screening results compared to the single docking methods. From a qualitative point of view, the improvement in pose prediction accuracy was obtained by prioritizing ligand binding poses produced by a high number of docking methods, whereas with regards to virtual screening studies, high hit rates were obtained by prioritizing the compounds showing a high level of pose consensus. Conclusion: In this review, we provide an overview of the results obtained from the performance assessment of various consensus docking protocols and we illustrate successful case studies where consensus docking has been applied in virtual screening studies.


2020 ◽  
Author(s):  
Jessie L. Gan ◽  
Dhruv Kumar ◽  
Cynthia Chen ◽  
Bryn C. Taylor ◽  
Benjamin R. Jagger ◽  
...  

ABSTRACTThe discovery of new drugs is a time consuming and expensive process. Methods such as virtual screening, which can filter out ineffective compounds from drug libraries prior to expensive experimental study, have become popular research topics. As the computational drug discovery community has grown, in order to benchmark the various advances in methodology, organizations such as the Drug Design Data Resource have begun hosting blinded grand challenges seeking to identify the best methods for ligand pose-prediction, ligand affinity ranking, and free energy calculations. Such open challenges offer a unique opportunity for researchers to partner with junior students (e.g., high school and undergraduate) to validate basic yet fundamental hypotheses considered to be uninteresting to domain experts. Here, we, a group of high school-aged students and their mentors, present the results of our participation in Grand Challenge 4 where we predicted ligand affinity rankings for the Cathepsin S protease, an important protein target for autoimmune diseases. To investigate the effect of incorporating receptor dynamics on ligand affinity rankings, we employed the Relaxed Complex Scheme, a molecular docking method paired with molecular dynamics-generated receptor conformations. We found that CatS is a difficult target for molecular docking and we explore some advanced methods such as distance-restrained docking to try to improve the correlation with experiments. This project has exemplified the capabilities of high school students when supported with a rigorous curriculum, and demonstrates the value of community-driven competitions for beginners in computational drug discovery.


2021 ◽  
Vol 22 (19) ◽  
pp. 10801
Author(s):  
Jonathan Dickerhoff ◽  
Kassandra R. Warnecke ◽  
Kaibo Wang ◽  
Nanjie Deng ◽  
Danzhou Yang

G-quadruplexes are four-stranded nucleic acid secondary structures of biological significance and have emerged as an attractive drug target. The G4 formed in the MYC promoter (MycG4) is one of the most studied small-molecule targets, and a model system for parallel structures that are prevalent in promoter DNA G4s and RNA G4s. Molecular docking has become an essential tool in structure-based drug discovery for protein targets, and is also increasingly applied to G4 DNA. However, DNA, and in particular G4, binding sites differ significantly from protein targets. Here we perform the first systematic evaluation of four commonly used docking programs (AutoDock Vina, DOCK 6, Glide, and RxDock) for G4 DNA-ligand binding pose prediction using four small molecules whose complex structures with the MycG4 have been experimentally determined in solution. The results indicate that there are considerable differences in the performance of the docking programs and that DOCK 6 with GB/SA rescoring performs better than the other programs. We found that docking accuracy is mainly limited by the scoring functions. The study shows that current docking programs should be used with caution to predict G4 DNA-small molecule binding modes.


2019 ◽  
Author(s):  
Sahil Chhabra ◽  
Jingru Xie ◽  
Aaron T. Frank

ABSTRACTDetermining the 3-dimensional (3D) structures of ribonucleic acid (RNA)-small molecule complexes is critical to understanding molecular recognition in RNA. Computer docking can, in principle, be used to predict the 3D structure of RNA-small molecule complexes. Unfortunately, retrospective analysis has shown that the scoring functions that are typically used to rank poses tend to misclassify non-native poses as native, and vice versa. This misclassification of non-native poses severely limits the utility of computer docking in the context pose prediction, as well as in virtual screening. Here, we use machine learning to train a set of pose classifiers that estimate the relative “nativeness” of a set of RNA-ligand poses. At the heart of our approach is the use of a pose “fingerprint” that is a composite of a set of atomic fingerprints, which individually encode the local “RNA environment” around ligand atoms. We found that by ranking poses based on the classification scores from our machine learning classifiers, we were able to recover native-like poses better than when we ranked poses based on their docking scores. With a leave-one-out training and testing approach, we found that one of our classifiers could recover poses that were within 2.5 Å of the native poses in ∼80% of the 88 cases we examined, and similarly, on a separate validation set, we could recover such poses in ∼70% of the cases. Our set of classifiers, which we refer to as RNAPosers, should find utility as a tool to aid in RNA-ligand pose prediction and so we make RNAPosers open to the academic community via https://github.com/atfrank/RNAPosers.


2020 ◽  
Author(s):  
Kin Meng Wong ◽  
Shirley Siu

Protein-ligand docking programs are indispensable tools for predicting the binding pose of a ligand to the receptor protein in current structure-based drug design. In this paper, we evaluate the performance of grey wolf optimization (GWO) in protein-ligand docking. Two versions of the GWO docking program – the original GWO and the modified one with random walk – were implemented based on AutoDock Vina. Our rigid docking experiments show that the GWO programs have enhanced exploration capability leading to significant speedup in the search while maintaining comparable binding pose prediction accuracy to AutoDock Vina. For flexible receptor docking, the GWO methods are competitive in pose ranking but lower in success rates than AutoDockFR. Successful redocking of all the flexible cases to their holo structures reveals that inaccurate scoring function and lack of proper treatment of backbone are the major causes of docking failures.


Sign in / Sign up

Export Citation Format

Share Document