Hidden Bias in the DUD-E Dataset Leads to Misleading Performance of Deep Learning in Structure-Based Virtual Screening

<p>Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUD-E). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of protein-ligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development. </p>

Download Full-text

Identification of a novel inhibitor of SARS-CoV-2 3CL-PRO through virtual screening and molecular dynamics simulation

PeerJ ◽

10.7717/peerj.11261 ◽

2021 ◽

Vol 9 ◽

pp. e11261

Author(s):

Asim Kumar Bepari ◽

Hasan Mahmud Reza

Keyword(s):

Molecular Dynamics ◽

Hydrogen Bonding ◽

Virtual Screening ◽

Molecular Dynamics Simulations ◽

Mean Square ◽

Autodock Vina ◽

Protein Ligand Interactions ◽

Main Protease ◽

Ligand Interactions ◽

Dynamics Simulations

Background The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has ravaged lives across the globe since December 2019, and new cases are still on the rise. Peoples’ ongoing sufferings trigger scientists to develop safe and effective remedies to treat this deadly viral disease. While repurposing the existing FDA-approved drugs remains in the front line, exploring drug candidates from synthetic and natural compounds is also a viable alternative. This study employed a comprehensive computational approach to screen inhibitors for SARS-CoV-2 3CL-PRO (also known as the main protease), a prime molecular target to treat coronavirus diseases. Methods We performed 100 ns GROMACS molecular dynamics simulations of three high-resolution X-ray crystallographic structures of 3CL-PRO. We extracted frames at 10 ns intervals to mimic conformational diversities of the target protein in biological environments. We then used AutoDock Vina molecular docking to virtual screen the Sigma–Aldrich MyriaScreen Diversity Library II, a rich collection of 10,000 druglike small molecules with diverse chemotypes. Subsequently, we adopted in silico computation of physicochemical properties, pharmacokinetic parameters, and toxicity profiles. Finally, we analyzed hydrogen bonding and other protein-ligand interactions for the short-listed compounds. Results Over the 100 ns molecular dynamics simulations of 3CL-PRO’s crystal structures, 6LZE, 6M0K, and 6YB7, showed overall integrity with mean Cα root-mean-square deviation (RMSD) of 1.96 (±0.35) Å, 1.98 (±0.21) Å, and 1.94 (±0.25) Å, respectively. Average root-mean-square fluctuation (RMSF) values were 1.21 ± 0.79 (6LZE), 1.12 ± 0.72 (6M0K), and 1.11 ± 0.60 (6YB7). After two phases of AutoDock Vina virtual screening of the MyriaScreen Diversity Library II, we prepared a list of the top 20 ligands. We selected four promising leads considering predicted oral bioavailability, druglikeness, and toxicity profiles. These compounds also demonstrated favorable protein-ligand interactions. We then employed 50-ns molecular dynamics simulations for the four selected molecules and the reference ligand 11a in the crystallographic structure 6LZE. Analysis of RMSF, RMSD, and hydrogen bonding along the simulation trajectories indicated that S51765 would form a more stable protein-ligand complexe with 3CL-PRO compared to other molecules. Insights into short-range Coulombic and Lennard-Jones potentials also revealed favorable binding of S51765 with 3CL-PRO. Conclusion We identified a potential lead for antiviral drug discovery against the SARS-CoV-2 main protease. Our results will aid global efforts to find safe and effective remedies for COVID-19.

Download Full-text

Molecular recognition principles in protein-ligand interactions as a prerequisite for the design of specific and selective leads

Acta Crystallographica Section A Foundations of Crystallography ◽

10.1107/s010876730509971x ◽

2005 ◽

Vol 61 (a1) ◽

pp. c7-c7

Author(s):

G. Klebe

Keyword(s):

Molecular Recognition ◽

Protein Ligand Interactions ◽

Ligand Interactions

Download Full-text

Protein–Ligand Interactions. From Molecular Recognition to Drug Design. Herausgegeben von Hans-Joachim Böhm und Gisbert Schneider.

Angewandte Chemie ◽

10.1002/ange.200385027 ◽

2004 ◽

Vol 116 (2) ◽

pp. 148-148 ◽

Cited By ~ 1

Author(s):

Gerhard Hessler

Keyword(s):

Molecular Recognition ◽

Drug Design ◽

Protein Ligand Interactions ◽

Ligand Interactions

Download Full-text

Studying Protein–Ligand Interactions Using X-Ray Crystallography

Protein-Ligand Interactions - Methods in Molecular Biology ◽

10.1007/978-1-62703-398-5_17 ◽

2013 ◽

pp. 457-477 ◽

Cited By ~ 5

Author(s):

Andrew P. Turnbull ◽

Paul Emsley

Keyword(s):

X Ray ◽

X Ray Crystallography ◽

Protein Ligand Interactions ◽

Ligand Interactions

Download Full-text

Improving protein–ligand binding prediction by considering the bridging water molecules in Autodock

Journal of Theoretical and Computational Chemistry ◽

10.1142/s0219633619500275 ◽

2019 ◽

Vol 18 (05) ◽

pp. 1950027 ◽

Cited By ~ 1

Author(s):

Qiangna Lu ◽

Lian-Wen Qi ◽

Jinfeng Liu

Keyword(s):

Ligand Binding ◽

Considerable Improvement ◽

Ligand Docking ◽

Water Molecules ◽

Binding Modes ◽

Binding Prediction ◽

Docking Simulations ◽

Protein Ligand Interactions ◽

Ligand Interactions ◽

Docking Program

Water plays a significant role in determining the protein–ligand binding modes, especially when water molecules are involved in mediating protein–ligand interactions, and these important water molecules are receiving more and more attention in recent years. Considering the effects of water molecules has gradually become a routine process for accurate description of the protein–ligand interactions. As a free docking program, Autodock has been most widely used in predicting the protein–ligand binding modes. However, whether the inclusion of water molecules in Autodock would improve its docking performance has not been systematically investigated. Here, we incorporate important bridging water molecules into Autodock program, and systematically investigate the effectiveness of these water molecules in protein–ligand docking. This approach was evaluated using 18 structurally diverse protein–ligand complexes, in which several water molecules bridge the protein–ligand interactions. Different treatment of water molecules were tested by using the fixed and rotatable water molecules, and a considerable improvement in successful docking simulations was found when including these water molecules. This study illustrates the necessity of inclusion of water molecules in Autodock docking, and emphasizes the importance of a proper treatment of water molecules in protein–ligand binding predictions.

Download Full-text

Molecular Recognition in a Diverse Set of Protein–Ligand Interactions Studied with Molecular Dynamics Simulations and End-Point Free Energy Calculations

Journal of Chemical Information and Modeling ◽

10.1021/ci400312v ◽

2013 ◽

Vol 53 (10) ◽

pp. 2659-2670 ◽

Cited By ~ 26

Author(s):

Bo Wang ◽

Liwei Li ◽

Thomas D. Hurley ◽

Samy O. Meroueh

Keyword(s):

Molecular Dynamics ◽

Free Energy ◽

Molecular Recognition ◽

Molecular Dynamics Simulations ◽

Free Energy Calculations ◽

Energy Calculations ◽

End Point ◽

Protein Ligand Interactions ◽

Ligand Interactions ◽

Dynamics Simulations

Download Full-text

Molecular recognition by dihydrofolate reductase: n.m.r. studies of protein-ligand interactions

Biochemical Society Transactions ◽

10.1042/bst0160925 ◽

1988 ◽

Vol 16 (6) ◽

pp. 925-927 ◽

Cited By ~ 1

Author(s):

GORDON C. K. ROBERTS

Keyword(s):

Molecular Recognition ◽

Dihydrofolate Reductase ◽

Protein Ligand Interactions ◽

Ligand Interactions

Download Full-text

Book Review: Protein-Ligand Interactions-From Molecular Recognition to Drug Design Methods. Edited by Hans-Joachim Böhm and Gisbert Schneider

ChemBioChem ◽

10.1002/cbic.200390116 ◽

2003 ◽

Vol 4 (11) ◽

pp. 1249-1250

Author(s):

Andy Davis

Keyword(s):

Molecular Recognition ◽

Drug Design ◽

Design Methods ◽

Protein Ligand Interactions ◽

Ligand Interactions

Download Full-text

Footprinting of Inhibitor Interactions ofIn SilicoIdentified Inhibitors of Trypanothione Reductase ofLeishmaniaParasite

The Scientific World JOURNAL ◽

10.1100/2012/963658 ◽

2012 ◽

Vol 2012 ◽

pp. 1-13 ◽

Cited By ~ 10

Author(s):

Santhosh K. Venkatesan ◽

Vikash Kumar Dubey

Keyword(s):

Virtual Screening ◽

Leishmania Infantum ◽

Trypanothione Reductase ◽

Inhibition Kinetics ◽

Screening Process ◽

Protein Ligand Interactions ◽

Enzymatic Reduction ◽

Ligand Interactions ◽

First Time

Structure-based virtual screening of NCI Diversity set II compounds was performed to indentify novel inhibitor scaffolds of trypanothione reductase (TR) fromLeishmania infantum. The top 50 ranked hits were clustered using the AuPoSOM tool. Majority of the top-ranked compounds were Tricyclic. Clustering of hits yielded four major clusters each comprising varying number of subclusters differing in their mode of binding and orientation in the active site. Moreover, for the first time, we report selected alkaloids and dibenzothiazepines as inhibitors ofLeishmania infantumTR. The mode of binding observed among the clusters also potentiates the probablein vitroinhibition kinetics and aids in defining key interaction which might contribute to the inhibition of enzymatic reduction of T[S] 2. The method provides scope for automation and integration into the virtual screening process employing docking softwares, for clustering the small molecule inhibitors based upon protein-ligand interactions.

Download Full-text