scholarly journals Applications of Autoencoders along with Deep Learning Techniques to generate valid molecules

2021 ◽  
Vol 2070 (1) ◽  
pp. 012125
Author(s):  
T Sesha Sai Aparna ◽  
T Anuradha

Abstract From the moment of identifying the fundamental cause of an illness to its availability in the marketplace, it takes an average of 10 years and almost $2.6 billion dollars to develop a medication. We’re actually hunting for a needle in a haystack, which takes a lot of time, effort, and money. In a solution space of between 1030 and 10100 synthetically viable compounds, we’re seeking for the one molecule that can turn off a disease at the molecular level. The chemical solution space is just too large to adequately screen for the desired molecule. Only a small percentage of the synthetically viable compounds for wet lab research are stored in pharmaceutical chemical repositories. Computational de novo drug design can be used to explore this vast chemical space and develop previously undesigned compounds. Computational drug design can cut the amount of time spent in the discovery phase in half, resulting in a shorter time to market and lower drug prices. Deep learning and artificial intelligence (AI) have opened up new perspectives in cheminformatics, especially in molecules generative models. Recurrent neural networks (RNNs) trained with molecules in the SMILES text format, in particular, are very good at exploring the chemical space. Two baseline models were created for generating molecules, one of the model includes an encoder that takes SMILES as input and then develops a deep generative LSTM model which acts as a hidden layer and the output from layers acts as an input to the decoder. The other baseline model acts the same as the above-mentioned model but it includes latent space, it is simply a representation of compressed data that bring related data points closer together physically. To learn data properties and find simpler data representations for analysis, and weights which are obtained from the previous model to generate more efficient molecules. Then created a custom function to play with the temperature of the softmax activation function which creates a threshold value for the valid molecules to generate. This model enables us to produce new molecules through successful exploration.

2021 ◽  
Author(s):  
Xuhan Liu ◽  
Kai Ye ◽  
Herman W. T. van Vlijmen ◽  
Adriaan P. IJzerman ◽  
Gerard J. P. van Westen

Due to the large drug-like chemical space available to search for feasible drug-like molecules, rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified. With the rapid growth of the application of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work, we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives similar to other known methods and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. In this work, the Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules we proposed a novel positional encoding for each atom and bond based on an adjacency matrix to extend the architecture of the Transformer. Each molecule was generated by growing and connecting procedures for the fragments in the given scaffold that were unified into one model. Moreover, we trained this generator under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, our proposed method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results demonstrated the effectiveness of our method in that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.


2021 ◽  
Vol 61 (2) ◽  
pp. 621-630
Author(s):  
Sowmya Ramaswamy Krishnan ◽  
Navneet Bung ◽  
Gopalakrishnan Bulusu ◽  
Arijit Roy

2020 ◽  
Author(s):  
Josep Arús-Pous ◽  
Atanas Patronov ◽  
Esben Jannik Bjerrum ◽  
Christian Tyrchan ◽  
Jean-Louis Reymond ◽  
...  

Molecular generative models trained with small sets of molecules represented as SMILES strings are able to generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e. partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This is possible thanks to a new molecular set pre-processing algorithm that exhaustively cuts all combinations of acyclic bonds of every molecule, obtaining a large number of scaffold-decorations combinations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). Moreover, the resulting scaffold-decorations were filtered to only allow decorations that were fragment-like. This allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de-novo molecular generation.


2021 ◽  
Vol 22 (18) ◽  
pp. 9983
Author(s):  
Jintae Kim ◽  
Sera Park ◽  
Dongbo Min ◽  
Wankyu Kim

Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug–target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.


2020 ◽  
Author(s):  
Francesca Grisoni ◽  
Berend Huisman ◽  
Alexander Button ◽  
Michael Moret ◽  
Kenneth Atz ◽  
...  

<p>Automation of the molecular design-make-test-analyze cycle speeds up the identification of hit and lead compounds for drug discovery. Using deep learning for computational molecular design and a customized microfluidics platform for on-chip compound synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space defined by known LXRα agonists, and to suggest structural analogs of known ligands and novel molecular cores. To further the design of lead-like molecules and ensure compatibility with automated on-chip synthesis, this chemical space was confined to the set of virtual products obtainable from 17 different one-step reactions. Overall, 25 <i>de novo</i> generated compounds were successfully synthesized in flow via formation of sulfonamide, amide bond, and ester bond. First-pass <i>in vitro</i> activity screening of the crude reaction products in hybrid Gal4 reporter gene assays revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch re-synthesis, purification, and re-testing of 14 of these compounds confirmed that 12 of them were potent LXRα or LXRβ agonists. These results support the utilization of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.<b></b></p>


Author(s):  
Thomas Blaschke ◽  
Josep Arús-Pous ◽  
Hongming Chen ◽  
Christian Margreitter ◽  
Christian Tyrchan ◽  
...  

With this application note we aim to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space. By releasing the code we are aiming to facilitate the research on using generative methods on drug discovery problems and to promote the collaborative efforts in this area so that it can be used as an interaction point for future scientific collaborations.


2020 ◽  
Author(s):  
Thomas Blaschke ◽  
Josep Arús-Pous ◽  
Hongming Chen ◽  
Christian Margreitter ◽  
Christian Tyrchan ◽  
...  

With this application note we aim to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space. By releasing the code we are aiming to facilitate the research on using generative methods on drug discovery problems and to promote the collaborative efforts in this area so that it can be used as an interaction point for future scientific collaborations.


2021 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


2021 ◽  
Vol 7 (24) ◽  
pp. eabg3338
Author(s):  
Francesca Grisoni ◽  
Berend J. H. Huisman ◽  
Alexander L. Button ◽  
Michael Moret ◽  
Kenneth Atz ◽  
...  

Automating the molecular design-make-test-analyze cycle accelerates hit and lead finding for drug discovery. Using deep learning for molecular design and a microfluidics platform for on-chip chemical synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space of known LXRα agonists and generate novel molecular candidates. To ensure compatibility with automated on-chip synthesis, the chemical space was confined to the virtual products obtainable from 17 one-step reactions. Twenty-five de novo designs were successfully synthesized in flow. In vitro screening of the crude reaction products revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch resynthesis, purification, and retesting of 14 of these compounds confirmed that 12 of them were potent LXR agonists. These results support the suitability of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.


Sign in / Sign up

Export Citation Format

Share Document