scholarly journals Beyond Generative Models: Superfast Traversal, Optimization, Novelty, Exploration and Discovery (STONED) Algorithm for Molecules using SELFIES

Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.

2021 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.


2020 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alan Aspuru-Guzik

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.


2021 ◽  
Author(s):  
AkshatKumar Nigam ◽  
Robert Pollice ◽  
Mario Krenn ◽  
Gabriel dos Passos Gomes ◽  
Alán Aspuru-Guzik

Interpolation and exploration within the chemical space for inverse design.


2020 ◽  
Author(s):  
Fergus Imrie ◽  
Anthony R. Bradley ◽  
Charlotte M. Deane

An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, rather than learning how to perform molecular recognition. This fundamental issue prevents generalisation and hinders virtual screening method development. We have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.163 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.71 to 0.63. The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources.


Author(s):  
Parvathi R. ◽  
Pattabiraman V.

This chapter proposes a hybrid method for classification of the objects based on deep neural network and a similarity-based search algorithm. The objects are pre-processed with external conditions. After pre-processing and training different deep learning networks with the object dataset, the authors compare the results to find the best model to improve the accuracy of the results based on the features of object images extracted from the feature vector layer of a neural network. RPFOREST (random projection forest) model is used to predict the approximate nearest images. ResNet50, InceptionV3, InceptionV4, and DenseNet169 models are trained with this dataset. A proposal for adaptive finetuning of the deep learning models by determining the number of layers required for finetuning with the help of the RPForest model is given, and this experiment is conducted using the Xception model.


2020 ◽  
Author(s):  
Bruno J. Neves ◽  
José T. Moreira-Filho ◽  
Arthur C. Silva ◽  
Joyce V. V. B. Borba ◽  
Melina Mottin ◽  
...  

In this manuscript we describe the development of an automated framework for the curation of chemogenomics data and to develop QSAR models for virtual screening using the open-source KNIME software. The workflow includes four modules: (i) dataset preparation and curation; (ii) chemical space analysis and structure-activity relationships (SAR) rules; (iii) modeling; and (iv) virtual screening (VS). As case studies, we applied these workflows to four datasets associated with different endpoints. The implemented protocol can efficiently curate chemical and biological data in public databases and generates robust QSAR models. We provide scientists a simple and guided cheminformatics workbench following the best practices widely accepted by the community, in which scientists can adapt to solve their research problems. The workflows are freely available for download in GitHub.


2021 ◽  
Author(s):  
Michael A. Skinnider ◽  
R. Greg Stacey ◽  
David S. Wishart ◽  
Leonard J. Foster

Deep generative models are powerful tools for the exploration of chemical space, enabling the on-demand gener- ation of molecules with desired physical, chemical, or biological properties. However, these models are typically thought to require training datasets comprising hundreds of thousands, or even millions, of molecules. This per- ception limits the application of deep generative models in regions of chemical space populated by only a small number of examples. Here, we systematically evaluate and optimize generative models of molecules for low-data settings. We carry out a series of systematic benchmarks, training more than 5,000 deep generative models and evaluating over 2.6 billion generated molecules. We find that robust models can be learned from far fewer examples than has been widely assumed. We further identify strategies that dramatically reduce the number of molecules required to learn a model of equivalent quality, and demonstrate the application of these principles by learning models of chemical structures found in bacterial, plant, and fungal metabolomes. The structure of our experiments also allows us to benchmark the metrics used to evaluate generative models themselves. We find that many of the most widely used metrics in the field fail to capture model quality, but identify a subset of well-behaved metrics that provide a sound basis for model development. Collectively, our work provides a foundation for directly learning generative models in sparsely populated regions of chemical space.


2020 ◽  
Vol 39 (4) ◽  
pp. 4935-4945
Author(s):  
Qiuyun Cheng ◽  
Yun Ke ◽  
Ahmed Abdelmouty

Aiming at the limitation of using only word features in traditional deep learning sentiment classification, this paper combines topic features with deep learning models to build a topic-fused deep learning sentiment classification model. The model can fuse topic features to obtain high-quality high-level text features. Experiments show that in binary sentiment classification, the highest classification accuracy of the model can reach more than 90%, which is higher than that of commonly used deep learning models. This paper focuses on the combination of deep neural networks and emerging text processing technologies, and improves and perfects them from two aspects of model architecture and training methods, and designs an efficient deep network sentiment analysis model. A CNN (Convolutional Neural Network) model based on polymorphism is proposed. The model constructs the CNN input matrix by combining the word vector information of the text, the emotion information of the words, and the position information of the words, and adjusts the importance of different feature information in the training process by means of weight control. The multi-objective sample data set is used to verify the effectiveness of the proposed model in the sentiment analysis task of related objects from the classification effect and training performance.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012125
Author(s):  
T Sesha Sai Aparna ◽  
T Anuradha

Abstract From the moment of identifying the fundamental cause of an illness to its availability in the marketplace, it takes an average of 10 years and almost $2.6 billion dollars to develop a medication. We’re actually hunting for a needle in a haystack, which takes a lot of time, effort, and money. In a solution space of between 1030 and 10100 synthetically viable compounds, we’re seeking for the one molecule that can turn off a disease at the molecular level. The chemical solution space is just too large to adequately screen for the desired molecule. Only a small percentage of the synthetically viable compounds for wet lab research are stored in pharmaceutical chemical repositories. Computational de novo drug design can be used to explore this vast chemical space and develop previously undesigned compounds. Computational drug design can cut the amount of time spent in the discovery phase in half, resulting in a shorter time to market and lower drug prices. Deep learning and artificial intelligence (AI) have opened up new perspectives in cheminformatics, especially in molecules generative models. Recurrent neural networks (RNNs) trained with molecules in the SMILES text format, in particular, are very good at exploring the chemical space. Two baseline models were created for generating molecules, one of the model includes an encoder that takes SMILES as input and then develops a deep generative LSTM model which acts as a hidden layer and the output from layers acts as an input to the decoder. The other baseline model acts the same as the above-mentioned model but it includes latent space, it is simply a representation of compressed data that bring related data points closer together physically. To learn data properties and find simpler data representations for analysis, and weights which are obtained from the previous model to generate more efficient molecules. Then created a custom function to play with the temperature of the softmax activation function which creates a threshold value for the valid molecules to generate. This model enables us to produce new molecules through successful exploration.


2021 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


Sign in / Sign up

Export Citation Format

Share Document