scholarly journals Attention-based generative models for de novo molecular design

2021 ◽  
Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David Beck ◽  
Jim Pfaendtner

Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of...

Author(s):  
Joshua Meyers ◽  
Benedek Fabian ◽  
Nathan Brown

2019 ◽  
Author(s):  
Simon Johansson ◽  
Oleksii Ptykhodko ◽  
Josep Arús-Pous ◽  
Ola Engkvist ◽  
Hongming Chen

In recent years, deep learning for de novo molecular generation has become a rapidly growing research area. Recurrent neural networks (RNN) using the SMILES molecular representation is one of the most common approaches used. Recent study shows that the differentiable neural computer (DNC) can make considerable improvement over the RNN for modeling of sequential data. In the current study, DNC has been implemented as an extension to REINVENT, an RNN-based model that has already been used successfully to make de novo molecular design. The model was benchmarked on its capacity to learn the SMILES language on the GDB-13 and MOSES datasets. The DNC shows improvement on all test cases conducted at the cost of significantly increased computational time and memory consumption.


2021 ◽  
Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David A. C. Beck ◽  
Jim Pfaendtner

<div> <div> <div> <p>We explore the impact of adding attention to generative VAE models for molecular design. Four model types are compared: a simple recurrent VAE (RNN), a recurrent VAE with an added attention layer (RNNAttn), a transformer VAE (TransVAE) and the previous state-of-the-art (MosesVAE). The models are assessed based on their effect on the organization of the latent space (i.e. latent memory) and their ability to generate samples that are valid and novel. Additionally, the Shannon information entropy is used to measure the complexity of the latent memory in an information bottleneck theoretical framework and we define a novel metric to assess the extent to which models explore chemical phase space. All three models are trained on millions of molecules from either the ZINC or PubChem datasets. We find that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da. The TransVAE learns a complex “molecular grammar” that includes detailed molecular substructures and high-level structural and atomic relationships. The RNNAttn models learn the most efficient compression of the input data while still maintaining good performance. The complexity of the compressed representation learned by each model type increases in the order of MosesVAE < RNNAttn < RNN < TransVAE. We find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff and allow us to utilize the information-dense representations learned by the transformer in spite of their complexity. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David A. C. Beck ◽  
Jim Pfaendtner

<div> <div> <div> <p>We explore the impact of adding attention to generative VAE models for molecular design. Four model types are compared: a simple recurrent VAE (RNN), a recurrent VAE with an added attention layer (RNNAttn), a transformer VAE (TransVAE) and the previous state-of-the-art (MosesVAE). The models are assessed based on their effect on the organization of the latent space (i.e. latent memory) and their ability to generate samples that are valid and novel. Additionally, the Shannon information entropy is used to measure the complexity of the latent memory in an information bottleneck theoretical framework and we define a novel metric to assess the extent to which models explore chemical phase space. All three models are trained on millions of molecules from either the ZINC or PubChem datasets. We find that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da. The TransVAE learns a complex “molecular grammar” that includes detailed molecular substructures and high-level structural and atomic relationships. The RNNAttn models learn the most efficient compression of the input data while still maintaining good performance. The complexity of the compressed representation learned by each model type increases in the order of MosesVAE < RNNAttn < RNN < TransVAE. We find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff and allow us to utilize the information-dense representations learned by the transformer in spite of their complexity. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jeff Guo ◽  
Jon Paul Janet ◽  
Matthias R. Bauer ◽  
Eva Nittinger ◽  
Kathryn A. Giblin ◽  
...  

AbstractRecently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream.


2021 ◽  
Vol 8 (2) ◽  
pp. 53-62
Author(s):  
Mani Manavalan

In recent years, there has been an uptick in interest in generative models for molecules in drug development. In the field of de novo molecular design, these models are used to make molecules with desired properties from scratch. This is occasionally used instead of virtual screening, which is limited by the size of the libraries that can be searched in practice. Rather than screening existing libraries, generative models can be used to build custom libraries from scratch. Using generative models, which may optimize molecules straight towards the desired profile, this time-consuming approach can be sped up. The purpose of this work is to show how current shortcomings in evaluating generative models for molecules can be avoided. We cover both distribution-learning and goal-directed generation with a focus on the latter. Three well-known targets were downloaded from ChEMBL: Janus kinase 2 (JAK2), epidermal growth factor receptor (EGFR), and dopamine receptor D2 (DRD2) (Bento et al. 2014). We preprocessed the data to get binary classification jobs. Before calculating a scoring function, the data is split into two halves, which we shall refer to as split 1/2. The ratio of active to inactive users. Our goal is to train three bioactivity models with equal prediction performance, one to be used as a scoring function for chemical optimization and the other two to be used as performance evaluation models. Our findings suggest that distribution-learning can attain near-perfect scores on many existing criteria even with the most basic and completely useless models. According to benchmark studies, likelihood-based models account for many of the best technologies, and we propose that test set likelihoods be included in future comparisons.


2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


Author(s):  
Simon Johansson ◽  
Oleksii Ptykhodko ◽  
Josep Arús-Pous ◽  
Ola Engkvist ◽  
Hongming Chen

In recent years, deep learning for de novo molecular generation has become a rapidly growing research area. Recurrent neural networks (RNN) using the SMILES molecular representation is one of the most common approaches used. Recent study shows that the differentiable neural computer (DNC) can make considerable improvement over the RNN for modeling of sequential data. In the current study, DNC has been implemented as an extension to REINVENT, an RNN-based model that has already been used successfully to make de novo molecular design. The model was benchmarked on its capacity to learn the SMILES language on the GDB-13 and MOSES datasets. The DNC shows improvement on all test cases conducted at the cost of significantly increased computational time and memory consumption.


Sign in / Sign up

Export Citation Format

Share Document