scholarly journals Combining Cloud-Based Free Energy Calculations, Synthetically Aware Enumerations and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization

Author(s):  
Phani Ghanakota ◽  
Pieter Bos ◽  
Kyle Konze ◽  
Joshua Staker ◽  
Gabriel Marques ◽  
...  

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying < 10nM compounds over random selection, and a 1.5 fold enrichment in identifying < 10nM compounds over our previous method (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to “learn” the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space and (4) produce over 3,000,000 idea molecules and run 2153 FEP simulations, identifying 69 ideas with a predicted IC<sub>50</sub> < 10nM and 358 ideas with a predicted IC<sub>50</sub> <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br>

2020 ◽  
Author(s):  
Phani Ghanakota ◽  
Pieter Bos ◽  
Kyle Konze ◽  
Joshua Staker ◽  
Gabriel Marques ◽  
...  

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying < 10nM compounds over random selection, and a 1.5 fold enrichment in identifying < 10nM compounds over our previous method (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to “learn” the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space and (4) produce over 3,000,000 idea molecules and run 2153 FEP simulations, identifying 69 ideas with a predicted IC<sub>50</sub> < 10nM and 358 ideas with a predicted IC<sub>50</sub> <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br>


2019 ◽  
Author(s):  
Kyle Konze ◽  
Pieter Bos ◽  
Markus Dahlgren ◽  
Karl Leswing ◽  
Ivan Tubert-Brohman ◽  
...  

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.


2019 ◽  
Author(s):  
Kyle Konze ◽  
Pieter Bos ◽  
Markus Dahlgren ◽  
Karl Leswing ◽  
Ivan Tubert-Brohman ◽  
...  

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.


2019 ◽  
Author(s):  
Kyle Konze ◽  
Pieter Bos ◽  
Markus Dahlgren ◽  
Karl Leswing ◽  
Ivan Tubert-Brohman ◽  
...  

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.


2019 ◽  
Author(s):  
Kyle Konze ◽  
Pieter Bos ◽  
Markus Dahlgren ◽  
Karl Leswing ◽  
Ivan Tubert-Brohman ◽  
...  

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.


2021 ◽  
Author(s):  
Lauren Nelson ◽  
Sofia Bariami ◽  
Chris Ringrose ◽  
Joshua Horton ◽  
Vadiraj Kurdekar ◽  
...  

<div><div><div><p>The quantum mechanical bespoke (QUBE) force field approach has been developed to facilitate the automated derivation of potential energy function parameters for modelling protein-ligand binding. To date the approach has been validated in the context of Monte Carlo simulations of protein-ligand complexes. We describe here the implementation of the QUBE force field in the alchemical free energy calculation molecular dynamics simulation package SOMD. The implementation is validated by demonstrating the reproducibility of absolute hydration free energies computed with the QUBE force field across the SOMD and GROMACS software packages. We further demonstrate, by way of a case study involving two series of non-nucleoside inhibitors of HIV-1 reverse transcriptase, that the availability of QUBE in a modern simulation package that makes efficient use of GPU acceleration will facilitate high-throughput alchemical free energy calculations.</p></div></div></div>


2021 ◽  
Author(s):  
Marcus Wieder ◽  
Josh Fass ◽  
John D. Chodera

AbstractAlchemical free energy calculations are an important tool in the computational chemistry tool-box, enabling the efficient calculation of quantities critical for drug discovery such as ligand binding affinities, selectivities, and partition coefficients. However, modern alchemical free energy calculations suffer from three significant limitations: (1) modern molecular mechanics force fields are limited in their ability to model complex molecular interactions, (2) classical force fields are unable to treat phenomena that involve rearrangements of chemical bonds, and (3) these calculations are unable to easily learn to improve their performance if readily-available experimental data is available. Here, we show how all three limitations can be overcome through the use of quantum machine learning (QML) potentials capable of accurately modeling quantum chemical energetics even when chemical bonds are made and broken. Because these potentials are based on mathematically convenient deep learning architectures instead of traditional quantum chemical formulations, QML simulations can be run at a fraction of the cost of quantum chemical simulations using modern graphics processing units (GPUs) and machine learning frameworks. We demonstrate that alchemical free energy calculations in explicit solvent are especially simple to implement using QML potentials because these potentials lack singularities and other pathologies typical of molecular mechanics potentials, and that alchemical free energy calculations are highly effective even when bonds are broken or made. Finally, we show how a limited number of experimental free energy measurements can be used to significantly improve the accuracy of computed free energies for unrelated compounds with no significant generalization gap. We illustrate these concepts on the prediction of aqueous tautomer free energies (related to tautomer ratios), which are highly relevant to drug discovery in that more than a quarter of all approved drugs exist as a mixture of tautomers.


2022 ◽  
Author(s):  
Shomik Verma ◽  
Miguel Rivera ◽  
David O. Scanlon ◽  
Aron Walsh

Understanding the excited state properties of molecules provides insights into how they interact with light. These interactions can be exploited to design compounds for photochemical applications, including enhanced spectral conversion of light to increase the efficiency of photovoltaic cells. While chemical discovery is time- and resource-intensive experimentally, computational chemistry can be used to screen large-scale databases for molecules of interest in a procedure known as high-throughput virtual screening. The first step usually involves a high-speed but low-accuracy method to screen large numbers of molecules (potentially millions) so only the best candidates are evaluated with expensive methods. However, use of a coarse first-pass screening method can potentially result in high false positive or false negative rates. Therefore, this study uses machine learning to calibrate a high-throughput technique (xTB-sTDA) against a higher accuracy one (TD-DFT). Testing the calibration model shows a ~5-fold decrease in error in-domain and a ~3-fold decrease out-of-domain. The resulting mean absolute error of ~0.14 eV is in line with previous work in machine learning calibrations and out-performs previous work in linear calibration of xTB-sTDA. We then apply the calibration model to screen a 250k molecule database and map inaccuracies of xTB-sTDA in chemical space. We also show generalizability of the workflow by calibrating against a higher-level technique (CC2), yielding a similarly low error. Overall, this work demonstrates machine learning can be used to develop a both cheap and accurate method for large-scale excited state screening, enabling accelerated molecular discovery across a variety of disciplines.


2020 ◽  
Author(s):  
Tomas Bucko ◽  
Monika Gešvandtnerová ◽  
Dario Rocca

<div>While free energies are fundamental thermodynamic quantities to characterize chemical reactions, their calculation based on ab initio theory is usually limited by the high computational cost. This is particularly true if multiple levels of theory have to be tested to establish their relative accuracy, if highly expensive quantum mechanical approximations are of interest, and also if several different temperatures have to be considered. We present an ab initio approach that effectively couples perturbation theory and machine learning to make ab initio free energy calculations more affordable. Starting from results based on a certain production ab initio theory, perturbation theory is applied to obtain free energies. The large number of single point calculations required by a brute force application of this approach are here significantly decreased by applying machine learning techniques. Importantly, the </div><div>training of the machine learning model requires only a small amount of data and does not need to be </div><div>performed again when the temperature is decreased.</div><div>The accuracy and efficiency of this method is demonstrated by computing the free energy of activation of the </div><div>proton exchange reaction in the zeolite chabazite. Starting from an ab initio calculation based on a semilocal</div><div>approximation of density functional theory, free energies based on significantly </div><div>more expensive non-local van der Waals and hybrid functionals are obtained with only a few tens</div><div>of additional single point calculations. In this way this work paves the route to</div><div>quick free energy calculations using different levels of theory or approximations that would be</div><div>too computationally expensive to be directly employed in molecular dynamics or Monte Carlo simulations.</div>


Sign in / Sign up

Export Citation Format

Share Document