scholarly journals Leveraging molecular structure and bioactivity with chemical language models for drug design

Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Cyrill Brunner ◽  
Gisbert Schneider

Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.

2021 ◽  
Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>


2021 ◽  
Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>


2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Paul Katzberger ◽  
Gisbert Schneider

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.


2021 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


2020 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


2020 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


10.29007/xj9p ◽  
2020 ◽  
Author(s):  
Thi Hien Nguyen ◽  
Ngoc Loan Phan Thi

It is well-known that the laser-induced electron diffraction (LIED) contains molecular structural information that can be extracted with a spatial resolution of angström and time resolution of a few femtoseconds [1, 2]. The retrieval is based on the quantitative rescattering method (QRS) allowing the LIED signal to be split into two components [3], one of which is a laser-free differential cross-section (DCS) containing molecular structure. The method based on fitting the experimental DCS extracted from the LIED spectra to the theoretical DCS calculated with assumed initial structure parameters then allows one to reveal the real molecular structures. The theoretical DCS of molecules is treated within the independent atoms model (IAM) [1, 4] or the more advanced model based on the multiple scattering theory (MS) [2, 5].In this report, we talk about how to consider the molecular vibration effect to the MS model and examine this effect of molecular vibrations on the DCS by comparing the oscillation component with the component of the MS second-order describing the interference of the scattering waves. We perform an application of the developed theory for some diatomic molecules.


Author(s):  
Jiyong Park ◽  
Byungnam Kahng ◽  
Wonmuk Hwang

Self-assembly of β-sheet forming peptides into filaments has drawn great interests in biomedical applications [1,2]; Hydrogels formed by filaments self-assembled from de novo designed peptides possess potential applications for cell culture scaffolds [3]. On the other hand, peptides derived from amyloidogenic proteins in neurodegenerative diseases such as Alzheimer’s and Parkinson’s also form similar β-sheet filaments in vitro. They share little sequence homology, yet filaments formed by these self-assembling peptides commonly have the cross-β structure, the key signature of the amyloid fibril. Detailed structural information of the self-assembled β-sheet filaments has been limited partly due to the difficulty in preparing ordered filament samples, and it has been only recently that solid-state nuclear magnetic resonance and x-ray techniques have revealed their molecular structure at the atomic level [4,5]. Although molecular structures of amyloid fibrils are becoming available, physical principles governing their self-assembly and the properties of the filaments are not well-understood, for which computational as well as theoretical approaches are desirable [6].


2019 ◽  
Vol 73 (12) ◽  
pp. 1006-1011 ◽  
Author(s):  
Francesca Grisoni ◽  
Gisbert Schneider

Drug discovery benefits from computational models aiding the identification of new chemical matter with bespoke properties. The field of de novo drug design has been particularly revitalized by adaptation of generative machine learning models from the field of natural language processing. These deep neural network models are trained on recognizing molecular structures and generate new molecular entities without relying on pre-determined sets of molecular building blocks and chemical transformations for virtual molecule construction. Implicit representation of chemical knowledge provides an alternative to formulating the molecular design task in terms of the established, explicit chemical vocabulary. Here, we review de novo molecular design approaches from the field of 'artificial intelligence', focusing on instances of deep generative models, and highlight the prospective application of long short-term memory models to hit and lead finding in medicinal chemistry.


Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


Sign in / Sign up

Export Citation Format

Share Document