scholarly journals Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>

2021 ◽  
Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>


2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Cyrill Brunner ◽  
Gisbert Schneider

Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.


2020 ◽  
Author(s):  
Francesca Grisoni ◽  
Berend Huisman ◽  
Alexander Button ◽  
Michael Moret ◽  
Kenneth Atz ◽  
...  

<p>Automation of the molecular design-make-test-analyze cycle speeds up the identification of hit and lead compounds for drug discovery. Using deep learning for computational molecular design and a customized microfluidics platform for on-chip compound synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space defined by known LXRα agonists, and to suggest structural analogs of known ligands and novel molecular cores. To further the design of lead-like molecules and ensure compatibility with automated on-chip synthesis, this chemical space was confined to the set of virtual products obtainable from 17 different one-step reactions. Overall, 25 <i>de novo</i> generated compounds were successfully synthesized in flow via formation of sulfonamide, amide bond, and ester bond. First-pass <i>in vitro</i> activity screening of the crude reaction products in hybrid Gal4 reporter gene assays revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch re-synthesis, purification, and re-testing of 14 of these compounds confirmed that 12 of them were potent LXRα or LXRβ agonists. These results support the utilization of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.<b></b></p>


2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Paul Katzberger ◽  
Gisbert Schneider

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.


2021 ◽  
Vol 7 (24) ◽  
pp. eabg3338
Author(s):  
Francesca Grisoni ◽  
Berend J. H. Huisman ◽  
Alexander L. Button ◽  
Michael Moret ◽  
Kenneth Atz ◽  
...  

Automating the molecular design-make-test-analyze cycle accelerates hit and lead finding for drug discovery. Using deep learning for molecular design and a microfluidics platform for on-chip chemical synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space of known LXRα agonists and generate novel molecular candidates. To ensure compatibility with automated on-chip synthesis, the chemical space was confined to the virtual products obtainable from 17 one-step reactions. Twenty-five de novo designs were successfully synthesized in flow. In vitro screening of the crude reaction products revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch resynthesis, purification, and retesting of 14 of these compounds confirmed that 12 of them were potent LXR agonists. These results support the suitability of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.


2021 ◽  
Author(s):  
Zhihong Liu ◽  
Jiewen Du ◽  
Bingdong Liu ◽  
Zongbin Cui ◽  
Jiansong Fang ◽  
...  

Abstract With the advances of deep learning techniques, various architectures for molecular generation have been proposed for de novo drug design. Successful cases from academia and industrial demonstrated that the deep learning-based de novo molecular design could efficiently accelerate the drug discovery process. The flourish of the de novo molecular generation methods and applications created a great demand for the visualization and functional profiling for the de novo generated molecules. The rising of publicly available chemogenomic databases lays good foundations and creates good opportunities for comprehensive profiling of the de novo library. In this paper, we present DenovoProfiling, a webserver dedicated to de novo library visualization and functional profiling. Currently, DenovoProfiling contains six modules: (1) identification & visualization, (2) chemical space, (3) scaffold analysis, (4) molecular alignment, (5) drugs mapping, and (6) target & pathway. DenovoProfiling could provide structural identification, chemical space exploration, drug mapping, and target & pathway information. The comprehensive annotated information could give users a clear picture of their de novo library and could guide the further selection of candidates for synthesis and biological confirmation. DenovoProfiling is freely available at http://denovoprofiling.xielab.net.


2022 ◽  
Vol 14 (1) ◽  
Author(s):  
Alan Kerstjens ◽  
Hans De Winter

AbstractGiven an objective function that predicts key properties of a molecule, goal-directed de novo molecular design is a useful tool to identify molecules that maximize or minimize said objective function. Nonetheless, a common drawback of these methods is that they tend to design synthetically unfeasible molecules. In this paper we describe a Lamarckian evolutionary algorithm for de novo drug design (LEADD). LEADD attempts to strike a balance between optimization power, synthetic accessibility of designed molecules and computational efficiency. To increase the likelihood of designing synthetically accessible molecules, LEADD represents molecules as graphs of molecular fragments, and limits the bonds that can be formed between them through knowledge-based pairwise atom type compatibility rules. A reference library of drug-like molecules is used to extract fragments, fragment preferences and compatibility rules. A novel set of genetic operators that enforce these rules in a computationally efficient manner is presented. To sample chemical space more efficiently we also explore a Lamarckian evolutionary mechanism that adapts the reproductive behavior of molecules. LEADD has been compared to both standard virtual screening and a comparable evolutionary algorithm using a standardized benchmark suite and was shown to be able to identify fitter molecules more efficiently. Moreover, the designed molecules are predicted to be easier to synthesize than those designed by other evolutionary algorithms. Graphical Abstract


2021 ◽  
Author(s):  
Zhihong Liu ◽  
Jiewen Du ◽  
Bingdong Liu ◽  
Zongbin Cui ◽  
Jiansong Fang ◽  
...  

AbstractWith the advances of deep learning techniques, various architectures for molecular generation have been proposed for de novo drug design. Successful cases from academia and industrial demonstrated that the deep learning based de novo molecular design could efficiently accelerate the drug discovery process. The flourish of the de novo molecular generation methods and applications created great demand for the visualization and functional profiling for the de novo generated molecules. The rising of publicly available chemogenomic databases lays good foundations and create good opportunities for comprehensive profiling of the de novo library. In this paper, we present DenovoProfiling, a web server dedicated for de novo library visualization and functional profiling. Currently, DenovoProfiling contains six modules: (1) identification & visualization, (2) chemical space, (3) scaffold analysis, (4) molecular alignment, (5) target & pathways, and (6) drugs mapping. DenovoProfiling could provide structural identification, chemical space exploration, drugs mapping, and targets & pathways. The comprehensive annotated information could give user a clear picture of their de novo library and could provide guidance in the further selection of candidates for synthesis and biological confirmation. DenovoProfiling is freely available at http://denovoprofiling.xielab.net.


2020 ◽  
Author(s):  
Francesca Grisoni ◽  
Berend Huisman ◽  
Alexander Button ◽  
Michael Moret ◽  
Kenneth Atz ◽  
...  

<p>Automation of the molecular design-make-test-analyze cycle speeds up the identification of hit and lead compounds for drug discovery. Using deep learning for computational molecular design and a customized microfluidics platform for on-chip compound synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space defined by known LXRα agonists, and to suggest structural analogs of known ligands and novel molecular cores. To further the design of lead-like molecules and ensure compatibility with automated on-chip synthesis, this chemical space was confined to the set of virtual products obtainable from 17 different one-step reactions. Overall, 25 <i>de novo</i> generated compounds were successfully synthesized in flow via formation of sulfonamide, amide bond, and ester bond. First-pass <i>in vitro</i> activity screening of the crude reaction products in hybrid Gal4 reporter gene assays revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch re-synthesis, purification, and re-testing of 14 of these compounds confirmed that 12 of them were potent LXRα or LXRβ agonists. These results support the utilization of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.<b></b></p>


Sign in / Sign up

Export Citation Format

Share Document