scholarly journals Enhanced Sampling of Chemical Space for High Throughput Screening Applications using Machine Learning

Author(s):  
Sarvesh Mehta ◽  
Siddhartha Laghuvarapu ◽  
Yashaswi Pathak ◽  
Aaftaab Sethi ◽  
Mallika Alvala ◽  
...  

<div>In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from large small-molecule drug library is evaluated for physical property such as the binding affinity (docking score) against a target receptor. In real-life drug discovery experiments, the drug libraries are extremely large but still a minor representation of the essentially infinite chemical space , and evaluation of physical property for each molecule in the library is not computationally feasible. </div><div>In the current study, a novel machine learning framework "MEMES" based on Bayesian optimization is proposed for efficient sampling of chemical space. The proposed framework is demonstrated to identify 90% of top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational hour and resources in not only drug-discovery but also areas that require such high-throughput experiments.</div>

2021 ◽  
Author(s):  
Sarvesh Mehta ◽  
Siddhartha Laghuvarapu ◽  
Yashaswi Pathak ◽  
Aaftaab Sethi ◽  
Mallika Alvala ◽  
...  

<div>In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from large small-molecule drug library is evaluated for physical property such as the binding affinity (docking score) against a target receptor. In real-life drug discovery experiments, the drug libraries are extremely large but still a minor representation of the essentially infinite chemical space , and evaluation of physical property for each molecule in the library is not computationally feasible. </div><div>In the current study, a novel machine learning framework "MEMES" based on Bayesian optimization is proposed for efficient sampling of chemical space. The proposed framework is demonstrated to identify 90% of top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational hour and resources in not only drug-discovery but also areas that require such high-throughput experiments.</div>


2020 ◽  
Author(s):  
Fang Liu ◽  
Chenru Duan ◽  
Heather Kulik

<p>Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3<i>d</i> transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO–LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO–LUMO gaps and FON-based diagnostics reveals differences in metal- and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO–LUMO gap complexes while ensuring low MR character. </p>


2020 ◽  
Author(s):  
Fang Liu ◽  
Chenru Duan ◽  
Heather Kulik

<p>Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3<i>d</i> transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO–LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO–LUMO gaps and FON-based diagnostics reveals differences in metal- and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO–LUMO gap complexes while ensuring low MR character. </p>


2019 ◽  
Vol 63 (9) ◽  
Author(s):  
Filip Zmuda ◽  
Lalitha Sastry ◽  
Sharon M. Shepherd ◽  
Deuan Jones ◽  
Alison Scott ◽  
...  

ABSTRACT Chagas’ disease, caused by the protozoan parasite Trypanosoma cruzi, is a potentially life-threatening condition that has become a global issue. Current treatment is limited to two medicines that require prolonged dosing and are associated with multiple side effects, which often lead to treatment discontinuation and failure. One way to address these shortcomings is through target-based drug discovery on validated T. cruzi protein targets. One such target is the proteasome, which plays a crucial role in protein degradation and turnover through chymotrypsin-, trypsin-, and caspase-like catalytic activities. In order to initiate a proteasome drug discovery program, we isolated proteasomes from T. cruzi epimastigotes and characterized their activity using a commercially available glow-like luminescence-based assay. We developed a high-throughput biochemical assay for the chymotrypsin-like activity of the T. cruzi proteasome, which was found to be sensitive, specific, and robust but prone to luminescence technology interference. To mitigate this, we also developed a counterscreen assay that identifies potential interferers at the levels of both the luciferase enzyme reporter and the mechanism responsible for a glow-like response. Interestingly, we also found that the peptide substrate for chymotrypsin-like proteasome activity was not specific and was likely partially turned over by other catalytic sites of the protein. Finally, we utilized these biochemical tools to screen 18,098 compounds, exploring diverse drug-like chemical space, which allowed us to identify 39 hits that were active in the primary screening assay and inactive in the counterscreen assay.


2020 ◽  
Author(s):  
Fang Liu ◽  
Chenru Duan ◽  
Heather Kulik

<p>Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many materials targets, such as those containing open-shell, 3<i>d</i> transition metals that can be expected to have strong multireference (MR) character. For DFT workflows to be predictive, we must incorporate automated, low cost methods that can distinguish the regions of chemical space where DFT should be applied and where it should not. We curate over 4,800 open shell transition metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO-LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. We train independent machine learning (ML) models to predict HOMO-LUMO gaps and FON-based diagnostics. ML model analysis reveals differences in metal- and ligand-sensitivity of the two quantities, suggesting opportunities to minimize MR character while tailoring the gap. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and discovering small HOMO-LUMO gap complexes with low MR character.</p>


2003 ◽  
Vol 9 (1) ◽  
pp. 49-58
Author(s):  
Margit Asmild ◽  
Nicholas Oswald ◽  
Karen M. Krzywkowski ◽  
Søren Friis ◽  
Rasmus B. Jacobsen ◽  
...  

2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


2021 ◽  
pp. 247255522110232
Author(s):  
Michael D. Scholle ◽  
Doug McLaughlin ◽  
Zachary A. Gurard-Levin

Affinity selection mass spectrometry (ASMS) has emerged as a powerful high-throughput screening tool used in drug discovery to identify novel ligands against therapeutic targets. This report describes the first high-throughput screen using a novel self-assembled monolayer desorption ionization (SAMDI)–ASMS methodology to reveal ligands for the human rhinovirus 3C (HRV3C) protease. The approach combines self-assembled monolayers of alkanethiolates on gold with matrix-assisted laser desorption ionization time-of-flight (MALDI TOF) mass spectrometry (MS), a technique termed SAMDI-ASMS. The primary screen of more than 100,000 compounds in pools of 8 compounds per well was completed in less than 8 h, and informs on the binding potential and selectivity of each compound. Initial hits were confirmed in follow-up SAMDI-ASMS experiments in single-concentration and dose–response curves. The ligands identified by SAMDI-ASMS were further validated using differential scanning fluorimetry (DSF) and in functional protease assays against HRV3C and the related SARS-CoV-2 3CLpro enzyme. SAMDI-ASMS offers key benefits for drug discovery over traditional ASMS approaches, including the high-throughput workflow and readout, minimizing compound misbehavior by using smaller compound pools, and up to a 50-fold reduction in reagent consumption. The flexibility of this novel technology opens avenues for high-throughput ASMS assays of any target, thereby accelerating drug discovery for diverse diseases.


Sign in / Sign up

Export Citation Format

Share Document