Giving Attention to Generative VAE Models for De Novo Molecular Design

10.26434/chemrxiv.13724629 ◽

2021 ◽

Author(s):

Orion Dollar ◽

Nisarg Joshi ◽

David A. C. Beck ◽

Jim Pfaendtner

Keyword(s):

De Novo ◽

Molecular Design ◽

Sampling Schemes ◽

Information Bottleneck ◽

Chemical Phase ◽

Shannon Information Entropy ◽

De Novo Molecular Design ◽

Previous State ◽

High Level ◽

The Impact

<div> <div> <div> <p>We explore the impact of adding attention to generative VAE models for molecular design. Four model types are compared: a simple recurrent VAE (RNN), a recurrent VAE with an added attention layer (RNNAttn), a transformer VAE (TransVAE) and the previous state-of-the-art (MosesVAE). The models are assessed based on their effect on the organization of the latent space (i.e. latent memory) and their ability to generate samples that are valid and novel. Additionally, the Shannon information entropy is used to measure the complexity of the latent memory in an information bottleneck theoretical framework and we define a novel metric to assess the extent to which models explore chemical phase space. All three models are trained on millions of molecules from either the ZINC or PubChem datasets. We find that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da. The TransVAE learns a complex “molecular grammar” that includes detailed molecular substructures and high-level structural and atomic relationships. The RNNAttn models learn the most efficient compression of the input data while still maintaining good performance. The complexity of the compressed representation learned by each model type increases in the order of MosesVAE < RNNAttn < RNN < TransVAE. We find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff and allow us to utilize the information-dense representations learned by the transformer in spite of their complexity. </p> </div> </div> </div>

Download Full-text

Attention-based generative models for de novo molecular design

Chemical Science ◽

10.1039/d1sc01050f ◽

2021 ◽

Author(s):

Orion Dollar ◽

Nisarg Joshi ◽

David Beck ◽

Jim Pfaendtner

Keyword(s):

De Novo ◽

Molecular Design ◽

Data Modeling ◽

Generative Models ◽

Sequential Data ◽

De Novo Molecular Design ◽

Generative Algorithms ◽

The Impact

Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of...

Download Full-text

De novo molecular design and generative models

Drug Discovery Today ◽

10.1016/j.drudis.2021.05.019 ◽

2021 ◽

Author(s):

Joshua Meyers ◽

Benedek Fabian ◽

Nathan Brown

Keyword(s):

De Novo ◽

Molecular Design ◽

Generative Models ◽

De Novo Molecular Design

Download Full-text

PRO_LIGAND: An Approach to de Novo Molecular Design. 2. Design of Novel Molecules from Molecular Field Analysis (MFA) Models and Pharmacophores

Journal of Medicinal Chemistry ◽

10.1021/jm00049a019 ◽

1994 ◽

Vol 37 (23) ◽

pp. 3994-4002 ◽

Cited By ~ 33

Author(s):

Bohdan Waszkowycz ◽

David E. Clark ◽

David Frenkel ◽

Jin Li ◽

Christopher W. Murray ◽

...

Keyword(s):

De Novo ◽

Molecular Design ◽

Field Analysis ◽

Molecular Field ◽

Molecular Field Analysis ◽

De Novo Molecular Design

Download Full-text

Comparison Between SMILES-Based Differential Neural Computer and Recurrent Neural Network Architectures for De Novo Molecule Design

10.26434/chemrxiv.9758600 ◽

2019 ◽

Author(s):

Simon Johansson ◽

Oleksii Ptykhodko ◽

Josep Arús-Pous ◽

Ola Engkvist ◽

Hongming Chen

Keyword(s):

De Novo ◽

Molecular Design ◽

Research Area ◽

Computational Time ◽

Test Cases ◽

Sequential Data ◽

Network Architectures ◽

De Novo Molecular Design ◽

Molecule Design ◽

The Cost

In recent years, deep learning for de novo molecular generation has become a rapidly growing research area. Recurrent neural networks (RNN) using the SMILES molecular representation is one of the most common approaches used. Recent study shows that the differentiable neural computer (DNC) can make considerable improvement over the RNN for modeling of sequential data. In the current study, DNC has been implemented as an extension to REINVENT, an RNN-based model that has already been used successfully to make de novo molecular design. The model was benchmarked on its capacity to learn the SMILES language on the GDB-13 and MOSES datasets. The DNC shows improvement on all test cases conducted at the cost of significantly increased computational time and memory consumption.

Download Full-text

Graph-Based Genetic Algorithm for De Novo Molecular Design

Proceedings of the 8th International Conference on Foundations of Computer-Aided Process Design - Computer Aided Chemical Engineering ◽

10.1016/b978-0-444-63433-7.50039-0 ◽

2014 ◽

pp. 327-332 ◽

Cited By ~ 1

Author(s):

Robert H. Herring ◽

Mario R. Eden

Keyword(s):

Genetic Algorithm ◽

De Novo ◽

Molecular Design ◽

De Novo Molecular Design

Download Full-text

CHAPTER 14. De Novo Molecular Design

Theoretical and Computational Chemistry Series - In Silico Medicinal Chemistry ◽

10.1039/9781782622604-00153 ◽

2015 ◽

pp. 153-162

Keyword(s):

De Novo ◽

Molecular Design ◽

De Novo Molecular Design

Download Full-text

PRO_LIGAND: An approach to de novo molecular design. 1. Application to the design of organic molecules

Journal of Computer-Aided Molecular Design ◽

10.1007/bf00117275 ◽

1995 ◽

Vol 9 (1) ◽

pp. 13-32 ◽

Cited By ~ 83

Author(s):

David E. Clark ◽

David Frenkel ◽

Stephen A. Levy ◽

Jin Li ◽

Christopher W. Murray ◽

...

Keyword(s):

Organic Molecules ◽

De Novo ◽

Molecular Design ◽

De Novo Molecular Design

Download Full-text

De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.8b00751 ◽

2019 ◽

Vol 59 (3) ◽

pp. 1182-1196 ◽

Cited By ~ 22

Author(s):

Boris Sattarov ◽

Igor I. Baskin ◽

Dragos Horvath ◽

Gilles Marcou ◽

Esben Jannik Bjerrum ◽

...

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

De Novo ◽

Molecular Design ◽

Topographic Mapping ◽

Generative Topographic Mapping ◽

De Novo Molecular Design

Download Full-text

Distribution and vulnerability of transcriptional outputs across the genome in Myc-amplified medulloblastoma cells

10.1101/2021.06.07.447394 ◽

2021 ◽

Author(s):

Rui Yang ◽

Wenzhe Wang ◽

Meichen Dong ◽

Kristen Roso ◽

Paula Greer ◽

...

Keyword(s):

Cancer Cells ◽

Tumor Cells ◽

Target Genes ◽

De Novo ◽

Inhibitory Effect ◽

Nucleotide Synthesis ◽

High Level ◽

The Impact

Myc plays a central role in tumorigenesis by orchestrating the expression of genes essential to numerous cellular processes1-4. While it is well established that Myc functions by binding to its target genes to regulate their transcription5, the distribution of the transcriptional output across the human genome in Myc-amplified cancer cells, and the susceptibility of such transcriptional outputs to therapeutic interferences remain to be fully elucidated. Here, we analyze the distribution of transcriptional outputs in Myc-amplified medulloblastoma (MB) cells by profiling nascent total RNAs within a temporal context. This profiling reveals that a major portion of transcriptional action in these cells was directed at the genes fundamental to cellular infrastructure, including rRNAs and particularly those in the mitochondrial genome (mtDNA). Notably, even when Myc protein was depleted by as much as 80%, the impact on transcriptional outputs across the genome was limited, with notable reduction mostly only in genes involved in ribosomal biosynthesis, genes residing in mtDNA or encoding mitochondria-localized proteins, and those encoding histones. In contrast to the limited direct impact of Myc depletion, we found that the global transcriptional outputs were highly dependent on the activity of Inosine Monophosphate Dehydrogenases (IMPDHs), rate limiting enzymes for de novo guanine nucleotide synthesis and whose expression in tumor cells was positively correlated with Myc expression. Blockage of IMPDHs attenuated the global transcriptional outputs with a particularly strong inhibitory effect on infrastructure genes, which was accompanied by the abrogation of MB cells proliferation in vitro and in vivo. Together, our findings reveal a real time action of Myc as a transcriptional factor in tumor cells, provide new insight into the pathogenic mechanism underlying Myc-driven tumorigenesis, and support IMPDHs as a therapeutic vulnerability in cancer cells empowered by a high level of Myc oncoprotein.

Download Full-text