scholarly journals Deep Generative Models Enable Navigation in Sparsely Populated Chemical Space

Author(s):  
Michael A. Skinnider ◽  
R. Greg Stacey ◽  
David S. Wishart ◽  
Leonard J. Foster

Deep generative models are powerful tools for the exploration of chemical space, enabling the on-demand gener- ation of molecules with desired physical, chemical, or biological properties. However, these models are typically thought to require training datasets comprising hundreds of thousands, or even millions, of molecules. This per- ception limits the application of deep generative models in regions of chemical space populated by only a small number of examples. Here, we systematically evaluate and optimize generative models of molecules for low-data settings. We carry out a series of systematic benchmarks, training more than 5,000 deep generative models and evaluating over 2.6 billion generated molecules. We find that robust models can be learned from far fewer examples than has been widely assumed. We further identify strategies that dramatically reduce the number of molecules required to learn a model of equivalent quality, and demonstrate the application of these principles by learning models of chemical structures found in bacterial, plant, and fungal metabolomes. The structure of our experiments also allows us to benchmark the metrics used to evaluate generative models themselves. We find that many of the most widely used metrics in the field fail to capture model quality, but identify a subset of well-behaved metrics that provide a sound basis for model development. Collectively, our work provides a foundation for directly learning generative models in sparsely populated regions of chemical space.

2021 ◽  
Author(s):  
Michael A. Skinnider ◽  
R. Greg Stacey ◽  
David S. Wishart ◽  
Leonard J. Foster

Deep generative models are powerful tools for the exploration of chemical space, enabling the on-demand gener- ation of molecules with desired physical, chemical, or biological properties. However, these models are typically thought to require training datasets comprising hundreds of thousands, or even millions, of molecules. This per- ception limits the application of deep generative models in regions of chemical space populated by only a small number of examples. Here, we systematically evaluate and optimize generative models of molecules for low-data settings. We carry out a series of systematic benchmarks, training more than 5,000 deep generative models and evaluating over 2.6 billion generated molecules. We find that robust models can be learned from far fewer examples than has been widely assumed. We further identify strategies that dramatically reduce the number of molecules required to learn a model of equivalent quality, and demonstrate the application of these principles by learning models of chemical structures found in bacterial, plant, and fungal metabolomes. The structure of our experiments also allows us to benchmark the metrics used to evaluate generative models themselves. We find that many of the most widely used metrics in the field fail to capture model quality, but identify a subset of well-behaved metrics that provide a sound basis for model development. Collectively, our work provides a foundation for directly learning generative models in sparsely populated regions of chemical space.


2021 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.


2020 ◽  
Author(s):  
Martino Bertoni ◽  
Miquel Duran-Frigola ◽  
Pau Badia-i-Mompel ◽  
Eduardo Pauls ◽  
Modesto Orozco-Ruiz ◽  
...  

AbstractChemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, ‘bioactivity descriptors’ are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our ‘signaturizers’ relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Martino Bertoni ◽  
Miquel Duran-Frigola ◽  
Pau Badia-i-Mompel ◽  
Eduardo Pauls ◽  
Modesto Orozco-Ruiz ◽  
...  

AbstractChemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.


2021 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1170-D1178
Author(s):  
Tianbiao Yang ◽  
Zhaojun Li ◽  
Yingjia Chen ◽  
Dan Feng ◽  
Guangchao Wang ◽  
...  

Abstract One of the most prominent topics in drug discovery is efficient exploration of the vast drug-like chemical space to find synthesizable and novel chemical structures with desired biological properties. To address this challenge, we created the DrugSpaceX (https://drugspacex.simm.ac.cn/) database based on expert-defined transformations of approved drug molecules. The current version of DrugSpaceX contains >100 million transformed chemical products for virtual screening, with outstanding characteristics in terms of structural novelty, diversity and large three-dimensional chemical space coverage. To illustrate its practical application in drug discovery, we used a case study of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, to show DrugSpaceX performing a quick search of initial hit compounds. Additionally, for ligand identification and optimization purposes, DrugSpaceX also provides several subsets for download, including a 10% diversity subset, an extended drug-like subset, a drug-like subset, a lead-like subset, and a fragment-like subset. In addition to chemical properties and transformation instructions, DrugSpaceX can locate the position of transformation, which will enable medicinal chemists to easily integrate strategy planning and protection design.


2020 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.


2009 ◽  
Vol 64 (6) ◽  
pp. 773-777 ◽  
Author(s):  
Alan R. Katritzky ◽  
Svetoslav Slavov ◽  
Maksim Radzvilovits ◽  
Iva Stoyanova-Slavova ◽  
Mati Karelson

The establishment of quantitative relationships between numerous molecular properties and chemical structures is now of great importance to society in understanding and improving environmental, medicinal and technological aspects of life. Quantitative structure-activity (property) relationships (QSA(P)R) relate physical, chemical, physico-chemical, technological and biological properties of compounds to their structure. A major factor driving the widespread use of QSP(A)R models is the rational estimation of properties of new compounds, without first synthesizing and testing them. Some of our recent findings in the field are briefly discussed below.


2019 ◽  
Author(s):  
Qi Yuan ◽  
Alejandro Santana-Bonilla ◽  
Martijn Zwijnenburg ◽  
Kim Jelfs

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>


2017 ◽  
Vol 68 (2) ◽  
pp. 317-322
Author(s):  
Anca Mihaela Mocanu ◽  
Constantin Luca ◽  
Alina Costina Luca

The purpose of this research is to synthetize, characterize and thermal degradation of new heterolytic derivates with potential biological properties. The derivates synthesis was done by obtaining new molecules with pyralozone structure which combine two pharmacophore entities: the amidosulfonyl-R1,R2 phenoxyacetil with the 3,5-dimethyl pyrazole which can have potential biological properties. The synthesis stages of the new products are presented as well as the elemental analysis data and IR, 1H-NMR spectral measurements made for elucidating the chemical structures and thermostability study which makes evident the temperature range proper for their use and storage. The obtained results were indicative of a good correlation of the structure with the thermal stability as estimated by means of the initial degradation temperatures as well as with the degradation mechanism by means of the TG-FTIR analysis.


Sign in / Sign up

Export Citation Format

Share Document