Virtual Screening with Gnina 1.0

Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Download Full-text

Virtual Screening with Gnina 1.0

10.20944/preprints202111.0329.v1 ◽

2021 ◽

Author(s):

Jocelyn Sunseri ◽

David Koes

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Compound Library ◽

Autodock Vina ◽

Convolutional Networks ◽

Development Costs ◽

Screening Performance ◽

Computationally Intensive ◽

Speed And Accuracy ◽

Virtual Screening Performance

Virtual screening - predicting which compounds within a specified compound library bind to a target molecule, typically a protein - is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Download Full-text

CompScore: Boosting Structure-Based Virtual Screening Performance by Incorporating Docking Scoring Function Components into Consensus Scoring

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00343 ◽

2019 ◽

Vol 59 (9) ◽

pp. 3655-3666 ◽

Cited By ~ 7

Author(s):

Yunierkis Perez-Castillo ◽

Stellamaris Sotomayor-Burneo ◽

Karina Jimenes-Vargas ◽

Mario Gonzalez-Rodriguez ◽

Maykel Cruz-Monteagudo ◽

...

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Screening Performance ◽

Consensus Scoring ◽

Virtual Screening Performance

Download Full-text

CompScore: boosting structure-based virtual screening performance by incorporating docking scoring functions components into consensus scoring

10.1101/550590 ◽

2019 ◽

Author(s):

Yunierkis Perez-Castillo ◽

Stellamaris Sotomayor-Burneo ◽

Karina Jimenes-Vargas ◽

Mario Gonzalez-Rodriguez ◽

Maykel Cruz-Monteagudo ◽

...

Keyword(s):

Genetic Algorithms ◽

Virtual Screening ◽

High Performance ◽

Scoring Function ◽

Scoring Functions ◽

Traditional Use ◽

Screening Performance ◽

Consensus Scoring ◽

Improved Performance ◽

Virtual Screening Performance

AbstractConsensus scoring has become a commonly used strategy within structure-based virtual screening (VS) workflows with improved performance compared to those based in a single scoring function. However, no research has been devoted to analyze the worth of docking scoring functions components in consensus scoring. We implemented and tested a method that incorporates docking scoring functions components into the setting of high performance VS workflows. This method uses genetic algorithms for finding the combination of scoring components that maximizes the VS enrichment for any target. Our methodology was validated using a dataset that contains ligands and decoys for 102 targets that has been widely used in VS validation studies. Results show that our approach outperforms other methods for all targets. It also boosts the initial enrichment performance of the traditional use of whole scoring functions in consensus scoring by an average of 45%. CompScore is freely available at: http://bioquimio.udla.edu.ec/compscore/

Download Full-text

Correlation between Virtual Screening Performance and Binding Site Descriptors of Protein Targets

International Journal of Medicinal Chemistry ◽

10.1155/2018/3829307 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Jamal Shamsara

Keyword(s):

Virtual Screening ◽

Binding Site ◽

Center Of Mass ◽

Scoring Function ◽

Scoring Functions ◽

Autodock Vina ◽

Protein Targets ◽

Shape Complementarity ◽

Screening Performance ◽

Consensus Scoring

Rescoring is a simple approach that theoretically could improve the original docking results. In this study AutoDock Vina was used as a docked engine and three other scoring functions besides the original scoring function, Vina, as well as their combinations as consensus scoring functions were employed to explore the effect of rescoring on virtual screenings that had been done on diverse targets. Rescoring by DrugScore produces the most number of cases with significant changes in screening power. Thus, the DrugScore results were used to build a simple model based on two binding site descriptors that could predict possible improvement by DrugScore rescoring. Furthermore, generally the screening power of all rescoring approach as well as original AutoDock Vina docking results correlated with the Maximum Theoretical Shape Complementarity (MTSC) and Maximum Distance from Center of Mass and all Alpha spheres (MDCMA). Therefore, it was suggested that, with a more complete set of binding site descriptors, it could be possible to find robust relationship between binding site descriptors and response to certain molecular docking programs and scoring functions. The results could be helpful for future researches aiming to do a virtual screening using AutoDock Vina and/or rescoring using DrugScore.

Download Full-text

Faculty Opinions recommendation of Enhancing Virtual Screening Performance of Protein Kinases with Molecular Dynamics Simulations.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726768955.793525021 ◽

2016 ◽

Author(s):

Jeremy C Smith

Keyword(s):

Molecular Dynamics ◽

Virtual Screening ◽

Molecular Dynamics Simulations ◽

Protein Kinases ◽

Screening Performance ◽

Dynamics Simulations ◽

Virtual Screening Performance

Download Full-text

Generating Property-Matched Decoy Molecules Using Deep Learning

10.1101/2020.08.26.268193 ◽

2020 ◽

Author(s):

Fergus Imrie ◽

Anthony R. Bradley ◽

Charlotte M. Deane

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Method Development ◽

Screening Method ◽

Screening Methods ◽

Additional Risk ◽

Link Type ◽

Screening Performance ◽

And Training ◽

Virtual Screening Performance

An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, rather than learning how to perform molecular recognition. This fundamental issue prevents generalisation and hinders virtual screening method development. We have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.163 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.71 to 0.63. The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources.

Download Full-text

Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities

Journal of Cheminformatics ◽

10.1186/s13321-021-00493-4 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Beihong Ji ◽

Xibing He ◽

Yuzhao Zhang ◽

Jingchen Zhai ◽

Viet Hoang Man ◽

...

Keyword(s):

Computational Cost ◽

Scoring Function ◽

Structural Similarity ◽

Scoring Functions ◽

Binding Affinities ◽

Autodock Vina ◽

Predictive Index ◽

Drug Lead ◽

Screening Performance ◽

Calibration Algorithm

AbstractIn this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.

Download Full-text