Penalized Variational Autoencoder for Molecular Design

2019 ◽  
Author(s):  
Sadegh Mohammadi ◽  
Bing O'Dowd ◽  
Christian Paulitz-Erdmann ◽  
Linus Goerlitz

Variational autoencoders have emerged as one of the most common approaches for automating molecular generation. We seek to learn a cross-domain latent space capturing chemical and biological information, simultaneously. To do so, we introduce the Penalized Variational Autoencoder which directly operates on SMILES, a linear string representation of molecules, with a weight penalty term in the decoder to address the imbalance in the character distribution of SMILES strings. We find that this greatly improves upon previous variational autoencoder approaches in the quality of the latent space and the generalization ability of the latent space to new chemistry. Next, we organize the latent space according to chemical and biological properties by jointly training the Penalized Variational Autoencoder with linear units. Extensive experiments on a range of tasks, including reconstruction, validity, and transferability demonstrates that the proposed methods here substantially outperform previous SMILES and graph-based methods, as well as introduces a new way to generate molecules from a set of desired properties, without prior knowledge of a chemical structure.

Author(s):  
Sadegh Mohammadi ◽  
Bing O'Dowd ◽  
Christian Paulitz-Erdmann ◽  
Linus Goerlitz

Variational autoencoders have emerged as one of the most common approaches for automating molecular generation. We seek to learn a cross-domain latent space capturing chemical and biological information, simultaneously. To do so, we introduce the Penalized Variational Autoencoder which directly operates on SMILES, a linear string representation of molecules, with a weight penalty term in the decoder to address the imbalance in the character distribution of SMILES strings. We find that this greatly improves upon previous variational autoencoder approaches in the quality of the latent space and the generalization ability of the latent space to new chemistry. Next, we organize the latent space according to chemical and biological properties by jointly training the Penalized Variational Autoencoder with linear units. Extensive experiments on a range of tasks, including reconstruction, validity, and transferability demonstrates that the proposed methods here substantially outperform previous SMILES and graph-based methods, as well as introduces a new way to generate molecules from a set of desired properties, without prior knowledge of a chemical structure.


2019 ◽  
Author(s):  
Sadegh Mohammadi ◽  
Bing O'Dowd ◽  
Christian Paulitz-Erdmann ◽  
Linus Goerlitz

Variational autoencoders have emerged as one of the most common approaches for automating molecular generation. We seek to learn a cross-domain latent space capturing chemical and biological information, simultaneously. To do so, we introduce the Penalized Variational Autoencoder which directly operates on SMILES, a linear string representation of molecules, with a weight penalty term in the decoder to address the imbalance in the character distribution of SMILES strings. We find that this greatly improves upon previous variational autoencoder approaches in the quality of the latent space and the generalization ability of the latent space to new chemistry. Next, we organize the latent space according to chemical and biological properties by jointly training the Penalized Variational Autoencoder with linear units. Extensive experiments on a range of tasks, including reconstruction, validity, and transferability demonstrates that the proposed methods here substantially outperform previous SMILES and graph-based methods, as well as introduces a new way to generate molecules from a set of desired properties, without prior knowledge of a chemical structure.


2021 ◽  
Vol 503 (3) ◽  
pp. 3351-3370
Author(s):  
David J Bastien ◽  
Anna M M Scaife ◽  
Hongming Tang ◽  
Micah Bowles ◽  
Fiona Porter

ABSTRACT We present a model for generating postage stamp images of synthetic Fanaroff–Riley Class I and Class II radio galaxies suitable for use in simulations of future radio surveys such as those being developed for the Square Kilometre Array. This model uses a fully connected neural network to implement structured variational inference through a variational autoencoder and decoder architecture. In order to optimize the dimensionality of the latent space for the autoencoder, we introduce the radio morphology inception score (RAMIS), a quantitative method for assessing the quality of generated images, and discuss in detail how data pre-processing choices can affect the value of this measure. We examine the 2D latent space of the VAEs and discuss how this can be used to control the generation of synthetic populations, whilst also cautioning how it may lead to biases when used for data augmentation.


2020 ◽  
Author(s):  
Xiaoxiang Zhu ◽  
Mengshu Hou ◽  
Xiaoyang Zeng ◽  
Hao Zhu

Most supervised systems of event detection (ED) task reply heavily on manual annotations and suffer from high-cost human effort when applied to new event types. To tackle this general problem, we turn our attention to few-shot learning (FSL). As a typical solution to FSL, cross-modal feature generation based frameworks achieve promising performance on images classification, which inspires us to advance this approach to ED task. In this work, we propose a model which extracts latent semantic features from event mentions, type structures and type names, then these three modalities are mapped into a shared low-dimension latent space by modality-specific aligned variational autoencoder enhanced by adversarial training. We evaluate the quality of our latent representations by training a CNN classifier to perform ED task. Experiments conducted on ACE2005 dataset show an improvement with 12.67% on F1-score when introducing adversarial training to VAE model, and our method is comparable with existing transfer learning framework for ED.


2021 ◽  
Author(s):  
Paolo Tirotta ◽  
Stefano Lodi

Transfer learning through large pre-trained models has changed the landscape of current applications in natural language processing (NLP). Recently Optimus, a variational autoencoder (VAE) which combines two pre-trained models, BERT and GPT-2, has been released, and its combination with generative adversarial networks (GANs) has been shown to produce novel, yet very human-looking text. The Optimus and GANs combination avoids the troublesome application of GANs to the discrete domain of text, and prevents the exposure bias of standard maximum likelihood methods. We combine the training of GANs in the latent space, with the finetuning of the decoder of Optimus for single word generation. This approach lets us model both the high-level features of the sentences, and the low-level word-by-word generation. We finetune using reinforcement learning (RL) by exploiting the structure of GPT-2 and by adding entropy-based intrinsically motivated rewards to balance between quality and diversity. We benchmark the results of the VAE-GAN model, and show the improvements brought by our RL finetuning on three widely used datasets for text generation, with results that greatly surpass the current state-of-the-art for the quality of the generated texts.


2018 ◽  
Vol 15 (8) ◽  
pp. 1109-1123
Author(s):  
Jonas da Silva Santos ◽  
Joel Jones Junior ◽  
Flavia M. da Silva

Background: We present here the synthesis of 1,3-thiazolidin-4-one (1) and its functionalised analogues, such as the classical isosteres, glitazone (1,3-thiazolidine-2,4-dione) (2), rhodanine (2-thioxo-1,3- thiazolidin-4-one) (3) and pseudothiohydantoin (2-imino-1,3-thiazolidin-4-one) (4) started in the midnineteenth century to the present day (1865-2018). Objective: The review focuses on the differences in the representation of the molecular structures discussed here over time since the first discussions about the structural theory by Kekulé, Couper and Butlerov. Moreover, advanced synthesis methodologies have been developed for obtaining these functional group, including green chemistry. We discuss about its structure and stability and we show the great biological potential. Conclusion: The 1,3-thiazolidin-4-one nucleus and functionalised analogues such as glitazones (1,3- thiazolidine-2,4-diones), rhodanines (2-thioxo-1,3-thiazolidin-4-ones) and pseudothiohydantoins (2-imino-1,3- thiazolidine-2-4-ones) have great pharmacological importance, and they are already found in commercial pharmaceuticals. Studies indicate a promising future in the area of medicinal chemistry with potential activities against different diseases. The synthesis of these nuclei started in the mid-nineteenth century (1865), with the first discussions about the structural theory by Kekulé, Couper and Butlerov. The present study has demonstrated the differences in the representations of the molecular structures discussed here over time. Since then, various synthetic methodologies have been developed for obtaining these nuclei, and several studies on their structural and biological properties have been performed. Different studies with regards to the green synthesis of these compounds were also presented here. This is the result of the process of environmental awareness. Additionally, the planet Earth is already showing clear signs of depletion, which is currently decreasing the quality of life.


2019 ◽  
Vol 14 (2) ◽  
pp. 93-116 ◽  
Author(s):  
Shabnam Mohebbi ◽  
Mojtaba Nasiri Nezhad ◽  
Payam Zarrintaj ◽  
Seyed Hassan Jafari ◽  
Saman Seyed Gholizadeh ◽  
...  

Biomedical engineering seeks to enhance the quality of life by developing advanced materials and technologies. Chitosan-based biomaterials have attracted significant attention because of having unique chemical structures with desired biocompatibility and biodegradability, which play different roles in membranes, sponges and scaffolds, along with promising biological properties such as biocompatibility, biodegradability and non-toxicity. Therefore, chitosan derivatives have been widely used in a vast variety of uses, chiefly pharmaceuticals and biomedical engineering. It is attempted here to draw a comprehensive overview of chitosan emerging applications in medicine, tissue engineering, drug delivery, gene therapy, cancer therapy, ophthalmology, dentistry, bio-imaging, bio-sensing and diagnosis. The use of Stem Cells (SCs) has given an interesting feature to the use of chitosan so that regenerative medicine and therapeutic methods have benefited from chitosan-based platforms. Plenty of the most recent discussions with stimulating ideas in this field are covered that could hopefully serve as hints for more developed works in biomedical engineering.


Author(s):  
Yinan Zhang ◽  
Yong Liu ◽  
Peng Han ◽  
Chunyan Miao ◽  
Lizhen Cui ◽  
...  

Cross-domain recommendation methods usually transfer knowledge across different domains implicitly, by sharing model parameters or learning parameter mappings in the latent space. Differing from previous studies, this paper focuses on learning explicit mapping between a user's behaviors (i.e. interaction itemsets) in different domains during the same temporal period. In this paper, we propose a novel deep cross-domain recommendation model, called Cycle Generation Networks (CGN). Specifically, CGN employs two generators to construct the dual-direction personalized itemset mapping between a user's behaviors in two different domains over time. The generators are learned by optimizing the distance between the generated itemset and the real interacted itemset, as well as the cycle-consistent loss defined based on the dual-direction generation procedure. We have performed extensive experiments on real datasets to demonstrate the effectiveness of the proposed model, comparing with existing single-domain and cross-domain recommendation methods.


2021 ◽  
Author(s):  
Vito Abbruzzese

The research project aims to enhance organic nutrient management in livestock farms using microbial and enzyme inoculations, with a particular focus on the phosphorus biogeochemical cycle. In order to do this the first approach consists of characterising the chemical and biological properties of farm slurries as a baseline to evaluate possible amendments of the intrinsic properties of the slurry. Consequently, it is pivotal to consider properties such as plant nutrients, i.e., phosphorus, nitrogen and potassium, as well as the microbial community within the slurry. Likewise, attention needs to be paid to soil chemical and biological properties, e.g. pH, salinity and organic matter, as well as to the variety of organisms inhabiting the soil, in order to determine the impact of inoculation on phosphorus cycling and nutrient availability for plant use. Furthermore, it is important to know how soil and its productivity may be influenced by the addition of the inoculated slurry. Of particular interest are also the soil properties which have an effect on plant growth. The pH of soil and, notably, nutrient availability and retention capacity are some of the features on which to direct the research in order to assess the quality of soil and, as a result, the production of a grass crop in livestock farms. The characterisation of these properties will be performed using a variety of approaches, beginning with analysis at laboratory- and mesocosm-scales and progressing to a fieldwork approach in order to evaluate the results directly in a farm system.


2021 ◽  
Author(s):  
Yuen Ler Chow ◽  
Shantanu Singh ◽  
Anne E Carpenter ◽  
Gregory P. Way

A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.


Sign in / Sign up

Export Citation Format

Share Document