Improving structure-based virtual screening performance via learning from scoring function components

Abstract Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein–ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available.

Download Full-text

CompScore: boosting structure-based virtual screening performance by incorporating docking scoring functions components into consensus scoring

10.1101/550590 ◽

2019 ◽

Author(s):

Yunierkis Perez-Castillo ◽

Stellamaris Sotomayor-Burneo ◽

Karina Jimenes-Vargas ◽

Mario Gonzalez-Rodriguez ◽

Maykel Cruz-Monteagudo ◽

...

Keyword(s):

Genetic Algorithms ◽

Virtual Screening ◽

High Performance ◽

Scoring Function ◽

Scoring Functions ◽

Traditional Use ◽

Screening Performance ◽

Consensus Scoring ◽

Improved Performance ◽

Virtual Screening Performance

AbstractConsensus scoring has become a commonly used strategy within structure-based virtual screening (VS) workflows with improved performance compared to those based in a single scoring function. However, no research has been devoted to analyze the worth of docking scoring functions components in consensus scoring. We implemented and tested a method that incorporates docking scoring functions components into the setting of high performance VS workflows. This method uses genetic algorithms for finding the combination of scoring components that maximizes the VS enrichment for any target. Our methodology was validated using a dataset that contains ligands and decoys for 102 targets that has been widely used in VS validation studies. Results show that our approach outperforms other methods for all targets. It also boosts the initial enrichment performance of the traditional use of whole scoring functions in consensus scoring by an average of 45%. CompScore is freely available at: http://bioquimio.udla.edu.ec/compscore/

Download Full-text

SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors

Journal of Cheminformatics ◽

10.1186/s13321-021-00507-1 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Surendra Kumar ◽

Mi-hyun Kim

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Scoring Functions ◽

Binding Affinities ◽

Ligand Interaction ◽

Fingerprint Pattern ◽

Comparable Performance ◽

Direct Interpretation ◽

Benchmark Datasets ◽

Complex Features

AbstractIn drug discovery, rapid and accurate prediction of protein–ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope for a high correlation between docking score and pose with key interactive residues, although scoring functions as free energy surrogates of protein–ligand complexes have failed to provide collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawbacks of scoring functions. Despite being highly accurate, their featurization process is complex and the meaning of the embedded features cannot directly be interpreted by human recognition without an additional feature analysis. Here, we propose SMPLIP-Score (Substructural Molecular and Protein–Ligand Interaction Pattern Score), a direct interpretable predictor of absolute binding affinity. Our simple featurization embeds the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite their less complex features than other state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80, and a root mean square error up to 1.18 in pK units with several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness were examined using direct interpretation of feature matrices for specific targets.

Download Full-text

Nonfitting protein-ligand interaction scoring function based on first-principles theoretical chemistry methods: Development and application on kinase inhibitors

Journal of Computational Chemistry ◽

10.1002/jcc.23303 ◽

2013 ◽

Vol 34 (19) ◽

pp. 1636-1646 ◽

Cited By ~ 31

Author(s):

Li Rao ◽

Igor Ying Zhang ◽

Wenping Guo ◽

Li Feng ◽

Eric Meggers ◽

...

Keyword(s):

First Principles ◽

Kinase Inhibitors ◽

Scoring Function ◽

Theoretical Chemistry ◽

Ligand Interaction ◽

Methods Development ◽

Protein Ligand Interaction

Download Full-text

Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities

Journal of Cheminformatics ◽

10.1186/s13321-021-00493-4 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Beihong Ji ◽

Xibing He ◽

Yuzhao Zhang ◽

Jingchen Zhai ◽

Viet Hoang Man ◽

...

Keyword(s):

Computational Cost ◽

Scoring Function ◽

Structural Similarity ◽

Scoring Functions ◽

Binding Affinities ◽

Autodock Vina ◽

Predictive Index ◽

Drug Lead ◽

Screening Performance ◽

Calibration Algorithm

AbstractIn this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.

Download Full-text

CompScore: Boosting Structure-Based Virtual Screening Performance by Incorporating Docking Scoring Function Components into Consensus Scoring

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00343 ◽

2019 ◽

Vol 59 (9) ◽

pp. 3655-3666 ◽

Cited By ~ 7

Author(s):

Yunierkis Perez-Castillo ◽

Stellamaris Sotomayor-Burneo ◽

Karina Jimenes-Vargas ◽

Mario Gonzalez-Rodriguez ◽

Maykel Cruz-Monteagudo ◽

...

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Screening Performance ◽

Consensus Scoring ◽

Virtual Screening Performance

Download Full-text

Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark

Nature Protocols ◽

10.1038/nprot.2017.114 ◽

2018 ◽

Vol 13 (4) ◽

pp. 666-680 ◽

Cited By ~ 26

Author(s):

Yan Li ◽

Minyi Su ◽

Zhihai Liu ◽

Jie Li ◽

Jie Liu ◽

...

Keyword(s):

Scoring Functions ◽

Ligand Interaction ◽

Protein Ligand Interaction

Download Full-text

Protein Ligand Interaction Fingerprints

Pharmaceutical Sciences ◽

10.4018/978-1-5225-1762-7.ch041 ◽

2017 ◽

pp. 1072-1091

Author(s):

Ali HajiEbrahimi ◽

Hamidreza Ghafouri ◽

Mohsen Ranjbar ◽

Amirhossein Sakhteman

Keyword(s):

Virtual Screening ◽

Docking Studies ◽

Scoring Functions ◽

Ligand Interaction ◽

Advantages And Disadvantages ◽

Protein Affinity ◽

Binding Cavity ◽

Protein Ligand Interaction ◽

Interaction Fingerprints ◽

Activity Information

A most challenging part in docking-based virtual screening is the scoring functions implemented in various docking programs in order to evaluate different poses of the ligands inside the binding cavity of the receptor. Precise and trustable measurement of ligand-protein affinity for Structure-Based Virtual Screening (SB-VS) is therefore, an outstanding problem in docking studies. Empirical post-docking filters can be helpful as a way to provide various types of structure-activity information. Different types of interaction have been presented between the ligands and the receptor so far. Based on the diversity and importance of PLIF methods, this chapter will focus on the comparison of different protocols. The advantages and disadvantages of all methods will be discussed explicitly in this chapter as well as future sights for further progress in this field. Different classifications approaches for the protein-ligand interaction fingerprints were also discussed in this chapter.

Download Full-text

Enhance the performance of current scoring functions with the aid of 3D protein-ligand interaction fingerprints

BMC Bioinformatics ◽

10.1186/s12859-017-1750-5 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 7

Author(s):

Jie Liu ◽

Minyi Su ◽

Zhihai Liu ◽

Jie Li ◽

Yan Li ◽

...

Keyword(s):

Scoring Functions ◽

Ligand Interaction ◽

Protein Ligand Interaction ◽

Interaction Fingerprints

Download Full-text

Virtual Screening with Gnina 1.0

10.20944/preprints202111.0329.v1 ◽

2021 ◽

Author(s):

Jocelyn Sunseri ◽

David Koes

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Compound Library ◽

Autodock Vina ◽

Convolutional Networks ◽

Development Costs ◽

Screening Performance ◽

Computationally Intensive ◽

Speed And Accuracy ◽

Virtual Screening Performance

Virtual screening - predicting which compounds within a specified compound library bind to a target molecule, typically a protein - is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Download Full-text

SMPLIP-Score: Predicting the Ligand Binding Affinity from Simple and Interpretable On-The-Fly Interaction Fingerprint Pattern Descriptors

10.21203/rs.3.rs-74202/v1 ◽

2020 ◽

Author(s):

Surendra Kumar ◽

Mi-hyun Kim

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Scoring Functions ◽

Ligand Complex ◽

Ligand Interaction ◽

Fingerprint Pattern ◽

Comparable Performance ◽

Direct Interpretation ◽

Benchmark Datasets ◽

Lower Complexity

Abstract In drug discovery, rapid and accurate prediction of protein-ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope high correlation between a docking score and a pose with key interactive residues, though scoring functions as a free energy surrogate of a protein-ligand complex have failed to provide the collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawback of scoring functions. Despite their high accuracy, their featurization process is complex and requires high cost for its interpretation (less compatible for human recognition). Here, we propose SMPLIP-Score (Substructural Molecular and Protein-Ligand Interaction Pattern Score), a simple interpretable predictor of the absolute binding affinity. Our simple featurization embedded the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite lower complexity than state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80 and a RMSE up to 1.18 in pK units on several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness also were examined with direct interpretation of feature matrices for specific targets.

Download Full-text