A Deep Molecular Generative Model Based on Multi-Resolution Graph Variational Autoencoders

Author(s):  
zhenxiang gao ◽  
xinyu wang ◽  
Blake Blumenfeld Gaines ◽  
Jinbo Bi ◽  
minghu song

Deep generative models have recently emerged as encouraging tools for the de novo molecular structure generation. Even though considerable advances have been achieved in recent years, the field of generative molecular design is still in its infancy. One potential solution may be to integrate domain knowledge of structural or medicinal chemistry into the data-driven machine learning process to address specific deep molecule generation goals. This manuscript proposes a new graph-based hierarchical variational autoencoder (VAE) model for molecular generation. Training molecules are first decomposed into small molecular fragments. Unlike other motif-based molecular graph generative models, we further group decomposed fragments into different interchangeable fragment clusters according to their local structural environment around the attachment points where the bond-breaking occurs. In this way, each chemical structure can be transformed into a three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer, respectively. We construct a hierarchical VAE model to learn such three-layer hierarchical graph representations of chemical structures in a fine-to-coarse order, in which atoms, decomposed fragments, and related fragment clusters act as graph nodes at each corresponding graph layer. The decoder component is designed to iteratively select a fragment out of a predicted fragment cluster vocabulary and then attach it to the preceding substructure. The newly introduced third graph layer will allow us to incorporate specific chemical structural knowledge, e.g., interchangeable fragments sharing similar local chemical environments or bioisosteres derived from matched molecular pair analysis information, into the molecular generation process. It will increase the odds of assembling new chemical moieties absent in the original training set and enhance structural diversity/novelty scores of generated structures. Our proposed approach demonstrates comparatively good performance in terms of model efficiency and other molecular evaluation metrics when compared with several other graph- and SMILES-based generative molecular models. We also analyze how our generative models' performance varies when choosing different fragment sampling techniques and radius parameters that determine the local structural environment of interchangeable fragment clusters. Hopefully, our multi-level hierarchical VAE prototyping model might promote more sophisticated works of knowledge-augmented deep molecular generation in the future.

2021 ◽  
Author(s):  
zhenxiang gao ◽  
xinyu wang ◽  
Blake Blumenfeld Gaines ◽  
Jinbo Bi ◽  
minghu song

Deep generative models have recently emerged as encouraging tools for the de novo molecular structure generation. Even though considerable advances have been achieved in recent years, the field of generative molecular design is still in its infancy. One potential solution may be to integrate domain knowledge of structural or medicinal chemistry into the data-driven machine learning process to address specific deep molecule generation goals. This manuscript proposes a new graph-based hierarchical variational autoencoder (VAE) model for molecular generation. Training molecules are first decomposed into small molecular fragments. Unlike other motif-based molecular graph generative models, we further group decomposed fragments into different interchangeable fragment clusters according to their local structural environment around the attachment points where the bond-breaking occurs. In this way, each chemical structure can be transformed into a three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer, respectively. We construct a hierarchical VAE model to learn such three-layer hierarchical graph representations of chemical structures in a fine-to-coarse order, in which atoms, decomposed fragments, and related fragment clusters act as graph nodes at each corresponding graph layer. The decoder component is designed to iteratively select a fragment out of a predicted fragment cluster vocabulary and then attach it to the preceding substructure. The newly introduced third graph layer will allow us to incorporate specific chemical structural knowledge, e.g., interchangeable fragments sharing similar local chemical environments or bioisosteres derived from matched molecular pair analysis information, into the molecular generation process. It will increase the odds of assembling new chemical moieties absent in the original training set and enhance structural diversity/novelty scores of generated structures. Our proposed approach demonstrates comparatively good performance in terms of model efficiency and other molecular evaluation metrics when compared with several other graph- and SMILES-based generative molecular models. We also analyze how our generative models' performance varies when choosing different fragment sampling techniques and radius parameters that determine the local structural environment of interchangeable fragment clusters. Hopefully, our multi-level hierarchical VAE prototyping model might promote more sophisticated works of knowledge-augmented deep molecular generation in the future.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Weidong Zhao ◽  
K. Julietraja ◽  
P. Venugopal ◽  
Xiujun Zhang

Theoretical chemists are fascinated by polycyclic aromatic hydrocarbons (PAHs) because of their unique electromagnetic and other significant properties, such as superaromaticity. The study of PAHs has been steadily increasing because of their wide-ranging applications in several fields, like steel manufacturing, shale oil extraction, coal gasification, production of coke, tar distillation, and nanosciences. Topological indices (TIs) are numerical quantities that give a mathematical expression for the chemical structures. They are useful and cost-effective tools for predicting the properties of chemical compounds theoretically. Entropic network measures are a type of TIs with a broad array of applications, involving quantitative characterization of molecular structures and the investigation of some specific chemical properties of molecular graphs. Irregularity indices are numerical parameters that quantify the irregularity of a molecular graph and are used to predict some of the chemical properties, including boiling points, resistance, enthalpy of vaporization, entropy, melting points, and toxicity. This study aims to determine analytical expressions for the VDB entropy and irregularity-based indices in the rectangular Kekulene system.


2020 ◽  
Author(s):  
Rocío Mercado ◽  
Tobias Rastemo ◽  
Edvard Lindelöf ◽  
Günter Klambauer ◽  
Ola Engkvist ◽  
...  

2010 ◽  
Vol 50 (7) ◽  
pp. 1257-1274 ◽  
Author(s):  
David White ◽  
Richard C. Wilson

2021 ◽  
Author(s):  
Michael A. Skinnider ◽  
R. Greg Stacey ◽  
David S. Wishart ◽  
Leonard J. Foster

Deep generative models are powerful tools for the exploration of chemical space, enabling the on-demand gener- ation of molecules with desired physical, chemical, or biological properties. However, these models are typically thought to require training datasets comprising hundreds of thousands, or even millions, of molecules. This per- ception limits the application of deep generative models in regions of chemical space populated by only a small number of examples. Here, we systematically evaluate and optimize generative models of molecules for low-data settings. We carry out a series of systematic benchmarks, training more than 5,000 deep generative models and evaluating over 2.6 billion generated molecules. We find that robust models can be learned from far fewer examples than has been widely assumed. We further identify strategies that dramatically reduce the number of molecules required to learn a model of equivalent quality, and demonstrate the application of these principles by learning models of chemical structures found in bacterial, plant, and fungal metabolomes. The structure of our experiments also allows us to benchmark the metrics used to evaluate generative models themselves. We find that many of the most widely used metrics in the field fail to capture model quality, but identify a subset of well-behaved metrics that provide a sound basis for model development. Collectively, our work provides a foundation for directly learning generative models in sparsely populated regions of chemical space.


2021 ◽  
Author(s):  
Rocío Mercado ◽  
Esben Bjerrum ◽  
Ola Engkvist

Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better coverage of training data features compared to a depth-first traversal. We have quantified these differences using a variety of metrics on a dataset of natural products. These metrics include: percent validity, molecular coverage, and molecular shape. We also observe that using either a breadth- or depth-first traversal it is possible to over-train the generative models, at which point the results with the graph traversal algorithm are identical


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Robert Christie ◽  
Adrian Abel

Abstract This introductory chapter presents an overview of the general principles underlying the structural chemistry, manufacturing processes, and application technology of organic pigments. The coverage provides a fundamental theoretical and practical basis for the chapters that follow in this series that are devoted to specific chemical classes of industrially significant organic pigments of the azo, phthalocyanine, carbonyl, dioxazine, and metal complex classes. The initial sections cover the fundamental differences which mean that dyes and pigments are considered universally as two separate types of colorant, based on their solubility characteristics. They also provide discussions of the contrasting chemical, technological, and performance features of organic and inorganic pigments. An outline of the most important historical features in the development of the synthetic organic pigment industry is then presented, from its origins in the 19th century that followed soon after the development of the industrial synthetic dye industry, through its expansion in the 20th century, to its current position as a mature global industry. A section then follows that describes the functions that organic pigments are required to perform in their application, mainly their optical functions that include not only color properties, including hue, strength, brightness, but also the contrasting requirements for transparency or opacity as demanded by specific applications. The pigments are also required to resist the conditions and agencies that they might encounter in applications, assessed as fastness properties, such as fastness to light, heat, solvents and chemicals, amongst many others, to an extent that specific applications demand. The principles, in broad terms, of the ways in which chemical structures determine colour and performance of organic pigments are discussed, with focus not only on the influence of molecular structure, but also on the effect of the crystal structural arrangement and the particulate structure, including particle size and shape and its distribution, on application performance. This is important as these pigments are applied as a dispersion of finely divided crystalline solid particles that are insoluble and are ultimately trapped mechanically in their application medium, often a polymer. The manufacture of organic pigments is discussed in broad terms. The overall process may be considered in stages, initiated by the chemical synthetic sequence in which the pigment is formed, followed by a conditioning stage where the crude product thus obtained is modified to optimise its performance properties, and finally finishing where the product is processed into a form, or preparation, that is suitable for its intended applications. Finally, the technological principles underlying a broad range of the most important application areas for organic pigments, which are mainly in paints, inks, and plastics, are discussed.


2004 ◽  
Vol 72 (10) ◽  
pp. 5925-5930 ◽  
Author(s):  
Leann L. MacLean ◽  
Malcolm B. Perry ◽  
Evguenii Vinogradov

ABSTRACT Serotyping of Actinobacillus pleuropneumoniae, the etiologic agent of porcine pleuropneumonia, is important for epidemiological studies and for the development of homologous vaccine cell preparations. The serology is based on the specific chemical structures of capsular polysaccharides (CPSs) and lipopolysaccharide (LPS) antigenic O-polysaccharide moieties (O-PSs), and knowledge of these structures is required for a molecular-level understanding of their serological specificities. The structures of A. pleuropneumoniae serotype 1 to 12 CPSs and O-PSs have been elucidated; however, the structures associated with three newly proposed serotypes (serotypes 13, 14, and 15) have not been reported. Herein we described the structures of the antigenic O-PS and CPS of A. pleuropneumoniae serotype 13. The O-PS of the A. pleuropneumoniae serotype 13 LPS is a polymer of branched tetrasaccharide repeating units composed of l-rhamnose, 2-acetamido-2-deoxy-d-galactose, and d-galactose residues (1:1:2). By use of hydrolysis, methylation, and periodate oxidation chemical methods together with the application of one- and two-dimensional 1H and 13C nuclear magnetic resonance spectroscopy and mass spectrometry, the structures of the O chain and CPS were determined. The CPS of A. pleuropneumoniae serotype 13 was characterized as a teichoic-acid type polymer. The LPS O antigen was identical to the O-PS produced by A. pleuropneumoniae serotype 7. The CPS has the unique structure of a 1,3-poly(glycerol phosphate) teichoic acid type I polymer and constitutes the macromolecule defining the A. pleuropneumoniae serotype 13 antigenic specificity.


2020 ◽  
Author(s):  
Andy Sode Anker ◽  
Emil T. S. Kjær ◽  
Erik B. Dam ◽  
Simon J. L. Billinge ◽  
Kirsten M. Ø. Jensen ◽  
...  

The development of new nanomaterials for energy technologies is dependent on understanding the intricate relation between material properties and atomic structure. It is, therefore, crucial to be able to routinely characterise the atomic structure in nanomaterials, and a promising method for this task is Pair Distribution Function (PDF) analysis. The PDF can be obtained through Fourier transformation of x-ray total scattering data, and represents a histogram of all interatomic distances in the sample. Going from the distance information in the PDF to a chemical structure is an unassigned distance geometry problem (uDGP), and solving this is often the bottleneck in nanostructure analysis. In this work, we propose to use a Conditional Variational Autoencoder (CVAE) to automatically solve the uDGP to obtain valid chemical structures from PDFs. We use a simple model system of hypothetical mono-metallic nanoparticles containing up to 100 atoms in the face centered cubic (FCC) structure as a proof of concept. The model is trained to predict the assigned distance matrix (aDM) from a simulated PDF of the structure as the conditional input. We introduce a novel representation of structures by projecting them inside a unit sphere and adding additional anchor points or satellites to help in the reconstruction of the chemical structure. The performance of the CVAE model is compared to a Deterministic Autoencoder (DAE) showing that both models are able to solve the uDGP reasonably well. We further show that the CVAE learns a structured and meaningful latent embedding space which can be used to predict new chemical structures.


2021 ◽  
Author(s):  
Jana Schor ◽  
Patrick Scheibe ◽  
Matthias Berndt ◽  
Wibke Busch ◽  
Chih Lai ◽  
...  

A plethora of chemical substances is out there in our environment, and all living species, including us humans, are exposed to various mixtures of these. Our society is accustomed to developing, producing, using and dispersing a diverse and vast amount of chemicals with the original intention to improve our standard of living. However, many chemicals pose risks, for example of developing severe diseases, if they occur at the wrong time in the wrong place. For the majority of the chemicals these risks are not known. Chemical risk assessment and subsequent regulation of use requires efficient and systematic strategies, which are not available so far. Experimental methods, even those of high-throughput, are still lab based and therefore too slow to keep up with the pace of chemical innovation.Existing computational approaches, e.g. ML based, are powerful on specific chemical classes, or sub-problems, but not applicable on a large scale. Their main limitation is the lack of applicability to chemicals outside the training data and the availability of sufficient amounts of training data. Here, we present the ready-to-use and stand-alone program deepFPlearn that predicts the association between chemical structures and effects on the gene/pathway level using deep learning. We show good performance values for our trained models, and demonstrate that our program can predict meaningful associations of chemicals and effects beyond the training range due to the application of a sophisticated feature compression approach using a deep autoencoder. Further, it can be applied to hundreds of thousands of chemicals in seconds. We provide deepFPlearn as open source and flexible tool that can be easily retrained and customized to different application settings at https://github.com/yigbt/deepFPlearn.


Sign in / Sign up

Export Citation Format

Share Document