scholarly journals Giving Attention to Generative VAE Models for De Novo Molecular Design

Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David A. C. Beck ◽  
Jim Pfaendtner

<div> <div> <div> <p>We explore the impact of adding attention to generative VAE models for molecular design. Four model types are compared: a simple recurrent VAE (RNN), a recurrent VAE with an added attention layer (RNNAttn), a transformer VAE (TransVAE) and the previous state-of-the-art (MosesVAE). The models are assessed based on their effect on the organization of the latent space (i.e. latent memory) and their ability to generate samples that are valid and novel. Additionally, the Shannon information entropy is used to measure the complexity of the latent memory in an information bottleneck theoretical framework and we define a novel metric to assess the extent to which models explore chemical phase space. All three models are trained on millions of molecules from either the ZINC or PubChem datasets. We find that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da. The TransVAE learns a complex “molecular grammar” that includes detailed molecular substructures and high-level structural and atomic relationships. The RNNAttn models learn the most efficient compression of the input data while still maintaining good performance. The complexity of the compressed representation learned by each model type increases in the order of MosesVAE < RNNAttn < RNN < TransVAE. We find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff and allow us to utilize the information-dense representations learned by the transformer in spite of their complexity. </p> </div> </div> </div>

2021 ◽  
Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David A. C. Beck ◽  
Jim Pfaendtner

<div> <div> <div> <p>We explore the impact of adding attention to generative VAE models for molecular design. Four model types are compared: a simple recurrent VAE (RNN), a recurrent VAE with an added attention layer (RNNAttn), a transformer VAE (TransVAE) and the previous state-of-the-art (MosesVAE). The models are assessed based on their effect on the organization of the latent space (i.e. latent memory) and their ability to generate samples that are valid and novel. Additionally, the Shannon information entropy is used to measure the complexity of the latent memory in an information bottleneck theoretical framework and we define a novel metric to assess the extent to which models explore chemical phase space. All three models are trained on millions of molecules from either the ZINC or PubChem datasets. We find that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da. The TransVAE learns a complex “molecular grammar” that includes detailed molecular substructures and high-level structural and atomic relationships. The RNNAttn models learn the most efficient compression of the input data while still maintaining good performance. The complexity of the compressed representation learned by each model type increases in the order of MosesVAE < RNNAttn < RNN < TransVAE. We find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff and allow us to utilize the information-dense representations learned by the transformer in spite of their complexity. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Orion Dollar ◽  
Nisarg Joshi ◽  
David Beck ◽  
Jim Pfaendtner

Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of...


Author(s):  
Joshua Meyers ◽  
Benedek Fabian ◽  
Nathan Brown

1994 ◽  
Vol 37 (23) ◽  
pp. 3994-4002 ◽  
Author(s):  
Bohdan Waszkowycz ◽  
David E. Clark ◽  
David Frenkel ◽  
Jin Li ◽  
Christopher W. Murray ◽  
...  

2019 ◽  
Author(s):  
Simon Johansson ◽  
Oleksii Ptykhodko ◽  
Josep Arús-Pous ◽  
Ola Engkvist ◽  
Hongming Chen

In recent years, deep learning for de novo molecular generation has become a rapidly growing research area. Recurrent neural networks (RNN) using the SMILES molecular representation is one of the most common approaches used. Recent study shows that the differentiable neural computer (DNC) can make considerable improvement over the RNN for modeling of sequential data. In the current study, DNC has been implemented as an extension to REINVENT, an RNN-based model that has already been used successfully to make de novo molecular design. The model was benchmarked on its capacity to learn the SMILES language on the GDB-13 and MOSES datasets. The DNC shows improvement on all test cases conducted at the cost of significantly increased computational time and memory consumption.


1995 ◽  
Vol 9 (1) ◽  
pp. 13-32 ◽  
Author(s):  
David E. Clark ◽  
David Frenkel ◽  
Stephen A. Levy ◽  
Jin Li ◽  
Christopher W. Murray ◽  
...  

2019 ◽  
Vol 59 (3) ◽  
pp. 1182-1196 ◽  
Author(s):  
Boris Sattarov ◽  
Igor I. Baskin ◽  
Dragos Horvath ◽  
Gilles Marcou ◽  
Esben Jannik Bjerrum ◽  
...  

2021 ◽  
Author(s):  
Rui Yang ◽  
Wenzhe Wang ◽  
Meichen Dong ◽  
Kristen Roso ◽  
Paula Greer ◽  
...  

Myc plays a central role in tumorigenesis by orchestrating the expression of genes essential to numerous cellular processes1-4. While it is well established that Myc functions by binding to its target genes to regulate their transcription5, the distribution of the transcriptional output across the human genome in Myc-amplified cancer cells, and the susceptibility of such transcriptional outputs to therapeutic interferences remain to be fully elucidated. Here, we analyze the distribution of transcriptional outputs in Myc-amplified medulloblastoma (MB) cells by profiling nascent total RNAs within a temporal context. This profiling reveals that a major portion of transcriptional action in these cells was directed at the genes fundamental to cellular infrastructure, including rRNAs and particularly those in the mitochondrial genome (mtDNA). Notably, even when Myc protein was depleted by as much as 80%, the impact on transcriptional outputs across the genome was limited, with notable reduction mostly only in genes involved in ribosomal biosynthesis, genes residing in mtDNA or encoding mitochondria-localized proteins, and those encoding histones. In contrast to the limited direct impact of Myc depletion, we found that the global transcriptional outputs were highly dependent on the activity of Inosine Monophosphate Dehydrogenases (IMPDHs), rate limiting enzymes for de novo guanine nucleotide synthesis and whose expression in tumor cells was positively correlated with Myc expression. Blockage of IMPDHs attenuated the global transcriptional outputs with a particularly strong inhibitory effect on infrastructure genes, which was accompanied by the abrogation of MB cells proliferation in vitro and in vivo. Together, our findings reveal a real time action of Myc as a transcriptional factor in tumor cells, provide new insight into the pathogenic mechanism underlying Myc-driven tumorigenesis, and support IMPDHs as a therapeutic vulnerability in cancer cells empowered by a high level of Myc oncoprotein.


Sign in / Sign up

Export Citation Format

Share Document