Generating property-matched decoy molecules using deep learning

Bioinformatics ◽

10.1093/bioinformatics/btab080 ◽

2021 ◽

Author(s):

Fergus Imrie ◽

Anthony R Bradley ◽

Charlotte M Deane

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Method Development ◽

Screening Method ◽

Supplementary Information ◽

Screening Methods ◽

Additional Risk ◽

Screening Performance ◽

And Training ◽

Virtual Screening Performance

Abstract Motivation An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development. Results We have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63. Availability and implementation The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.

Generating Property-Matched Decoy Molecules Using Deep Learning

10.1101/2020.08.26.268193 ◽

2020 ◽

Author(s):

Fergus Imrie ◽

Anthony R. Bradley ◽

Charlotte M. Deane

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Method Development ◽

Screening Method ◽

Screening Methods ◽

Additional Risk ◽

Link Type ◽

Screening Performance ◽

And Training ◽

Virtual Screening Performance

An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, rather than learning how to perform molecular recognition. This fundamental issue prevents generalisation and hinders virtual screening method development. We have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.163 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.71 to 0.63. The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources.

Faculty Opinions recommendation of Enhancing Virtual Screening Performance of Protein Kinases with Molecular Dynamics Simulations.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726768955.793525021 ◽

2016 ◽

Author(s):

Jeremy C Smith

Keyword(s):

Molecular Dynamics ◽

Virtual Screening ◽

Molecular Dynamics Simulations ◽

Protein Kinases ◽

Screening Performance ◽

Dynamics Simulations ◽

Virtual Screening Performance

Practical Model Selection for Prospective Virtual Screening

10.1101/337956 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shengchao Liu ◽

Moayad Alnammi ◽

Spencer S. Ericksen ◽

Andrew F. Voter ◽

Gene E. Ananiev ◽

...

Keyword(s):

Random Forest ◽

Virtual Screening ◽

Protein Interactions ◽

High Throughput Screening ◽

Screening Methods ◽

Protein Protein Interactions ◽

Screening Algorithm ◽

Screening Performance ◽

Wide Range ◽

Public Datasets

AbstractVirtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the dataset and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein-protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well in public datasets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.

Beyond Generative Models: Superfast Traversal, Optimization, Novelty, Exploration and Discovery (STONED) Algorithm for Molecules using SELFIES

10.26434/chemrxiv.13383266.v2 ◽

2021 ◽

Author(s):

AkshatKumar Nigam ◽

Robert Pollice ◽

Mario Krenn ◽

Gabriel dos Passos Gomes ◽

Alan Aspuru-Guzik

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Chemical Space ◽

Generative Models ◽

Inverse Design ◽

Learning Models ◽

Structure Modification ◽

Design Models ◽

Comparable Performance ◽

And Training

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.

CompScore: Boosting Structure-Based Virtual Screening Performance by Incorporating Docking Scoring Function Components into Consensus Scoring

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00343 ◽

2019 ◽

Vol 59 (9) ◽

pp. 3655-3666 ◽

Cited By ~ 7

Author(s):

Yunierkis Perez-Castillo ◽

Stellamaris Sotomayor-Burneo ◽

Karina Jimenes-Vargas ◽

Mario Gonzalez-Rodriguez ◽

Maykel Cruz-Monteagudo ◽

...

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Screening Performance ◽

Consensus Scoring ◽

Virtual Screening Performance

DUBS: A Framework for Developing Directory of Useful Benchmarking Sets for Virtual Screening

10.1101/2020.01.31.929679 ◽

2020 ◽

Author(s):

Jonathan Fine ◽

Matthew Muhoberac ◽

Guillaume Fraux ◽

Gaurav Chopra

Keyword(s):

Virtual Screening ◽

Software Package ◽

Screening Method ◽

Screening Methods ◽

Community Resource ◽

Flexible Tool ◽

Crucial Step ◽

Input Text ◽

Input Format ◽

Screening Software

AbstractBenchmarking is a crucial step in evaluating virtual screening methods for drug discovery. One major issue that arises among benchmarking datasets is a lack of a standardized format for representing the protein and ligand structures used to benchmark the virtual screening method. To address this, we introduce the Directory of Useful Benchmarking Sets (DUBS) framework, as a simple and flexible tool to rapidly created benchmarking sets using the protein databank. DUBS uses a simple input text based format along with the Lemon data mining framework to efficiently access and organize data to protein databank and output commonly used inputs for virtual screening software. The simple input format used by DUBS allows users to define their own benchmarking datasets and access the corresponding information directly from the software package. Currently, it only takes DUBS less than 2 minutes to create a benchmark using this format. Since DUBS uses a simple python script, users can easily modify to create more complex benchmarks. We hope that DUBS will be a useful community resource to provide a standardized representation for benchmarking datasets in virtual screening.

Virtual Screening with Gnina 1.0

10.20944/preprints202111.0329.v1 ◽

2021 ◽

Author(s):

Jocelyn Sunseri ◽

David Koes

Keyword(s):

Virtual Screening ◽

Scoring Function ◽

Compound Library ◽

Autodock Vina ◽

Convolutional Networks ◽

Development Costs ◽

Screening Performance ◽

Computationally Intensive ◽

Speed And Accuracy ◽

Virtual Screening Performance

Virtual screening - predicting which compounds within a specified compound library bind to a target molecule, typically a protein - is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Impact of different protonation states on virtual screening performance against cruzain

Chemical Biology & Drug Design ◽

10.1111/cbdd.14008 ◽

2021 ◽

Author(s):

Viviane Corrêa Santos ◽

Augusto César Broilo Campos ◽

Birgit J. Waldner ◽

Klaus R. Liedl ◽

Rafaela Salgado Ferreira

Keyword(s):

Virtual Screening ◽

Screening Performance ◽

Virtual Screening Performance

Improved virtual screening performance through docking scoring fusion in the discovery of dual target ligands for Parkinson’s disease

10.3390/mol2net-1-b031 ◽

2015 ◽

Author(s):

Yunierkis Pérez-Castillo ◽

Aliuska Morales-Helguera ◽

M. Natália D. S. Cordeiro ◽

Eduardo Tejera ◽

Cesar Paz-y-Miño ◽

...

Keyword(s):

Virtual Screening ◽

Screening Performance ◽

Virtual Screening Performance ◽

Dual Target

A cross docking pipeline for improving pose prediction and virtual screening performance

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-017-0048-z ◽

2017 ◽

Vol 32 (1) ◽

pp. 163-173 ◽

Cited By ~ 13

Author(s):

Ashutosh Kumar ◽

Kam Y. J. Zhang

Keyword(s):

Virtual Screening ◽

Pose Prediction ◽

Cross Docking ◽

Screening Performance ◽

Virtual Screening Performance