scholarly journals L-MolGAN: An improved implicit generative model for large molecular graphs

Author(s):  
Yutaka Tsujimoto ◽  
Satoru Hiwa ◽  
Yushi Nakamura ◽  
Yohei Oe ◽  
Tomoyuki Hiroyasu

<p>Deep generative models are used to generate arbitrary molecular structures with the desired chemical properties. MolGAN is a renowned molecular generation models that uses generative adversarial networks (GANs) and reinforcement learning to generate molecular graphs in one shot. MolGAN can effectively generate a small molecular graph with nine or fewer heavy atoms. However, the graphs tend to become disconnected as the molecular size increase. This poses a challenge to drug discovery and material design, where large molecules are potentially inclusive. This study develops an improved MolGAN for large molecule generation (L-MolGAN). In this model, the connectivity of molecular graphs is evaluated by a depth-first search during the model training process. When a disconnected molecular graph is generated, L-MolGAN rewards the graph a zero score. This procedure decreases the number of disconnected graphs, and consequently increases the number of connected molecular graphs. The effectiveness of L-MolGAN is experimentally evaluated. The size and connectivity of the molecular graphs generated with data from the ZINC-250k molecular dataset are confirmed using MolGAN as the baseline model. The model is then optimized for a quantitative estimate of drug-likeness (QED) to generate drug-like molecules. The experimental results indicate that the connectivity measure of generated molecular graphs improved by 1.96 compared with the baseline model at a larger maximum molecular size of 20 atoms. The molecules generated by L-MolGAN are evaluated in terms of multiple chemical properties, QED, synthetic accessibility, and log octanol–water partition coefficient, which are important in drug design. This result confirms that L-MolGAN can generate various drug-like molecules despite being optimized for a single property, i.e., QED. This method will contribute to the efficient discovery of new molecules of larger sizes than those being generated with the existing method.<br></p>

2021 ◽  
Author(s):  
Yutaka Tsujimoto ◽  
Satoru Hiwa ◽  
Yushi Nakamura ◽  
Yohei Oe ◽  
Tomoyuki Hiroyasu

<p>Deep generative models are used to generate arbitrary molecular structures with the desired chemical properties. MolGAN is a renowned molecular generation models that uses generative adversarial networks (GANs) and reinforcement learning to generate molecular graphs in one shot. MolGAN can effectively generate a small molecular graph with nine or fewer heavy atoms. However, the graphs tend to become disconnected as the molecular size increase. This poses a challenge to drug discovery and material design, where large molecules are potentially inclusive. This study develops an improved MolGAN for large molecule generation (L-MolGAN). In this model, the connectivity of molecular graphs is evaluated by a depth-first search during the model training process. When a disconnected molecular graph is generated, L-MolGAN rewards the graph a zero score. This procedure decreases the number of disconnected graphs, and consequently increases the number of connected molecular graphs. The effectiveness of L-MolGAN is experimentally evaluated. The size and connectivity of the molecular graphs generated with data from the ZINC-250k molecular dataset are confirmed using MolGAN as the baseline model. The model is then optimized for a quantitative estimate of drug-likeness (QED) to generate drug-like molecules. The experimental results indicate that the connectivity measure of generated molecular graphs improved by 1.96 compared with the baseline model at a larger maximum molecular size of 20 atoms. The molecules generated by L-MolGAN are evaluated in terms of multiple chemical properties, QED, synthetic accessibility, and log octanol–water partition coefficient, which are important in drug design. This result confirms that L-MolGAN can generate various drug-like molecules despite being optimized for a single property, i.e., QED. This method will contribute to the efficient discovery of new molecules of larger sizes than those being generated with the existing method.<br></p>


2021 ◽  
Author(s):  
Yutaka Tsujimoto ◽  
Satoru Hiwa ◽  
Yushi Nakamura ◽  
Yohei Oe ◽  
Tomoyuki Hiroyasu

<p>Deep generative models are used to generate arbitrary molecular structures with the desired chemical properties. MolGAN is a renowned molecular generation models that uses generative adversarial networks (GANs) and reinforcement learning to generate molecular graphs in one shot. MolGAN can effectively generate a small molecular graph with nine or fewer heavy atoms. However, the graphs tend to become disconnected as the molecular size increase. This poses a challenge to drug discovery and material design, where large molecules are potentially inclusive. This study develops an improved MolGAN for large molecule generation (L-MolGAN). In this model, the connectivity of molecular graphs is evaluated by a depth-first search during the model training process. When a disconnected molecular graph is generated, L-MolGAN rewards the graph a zero score. This procedure decreases the number of disconnected graphs, and consequently increases the number of connected molecular graphs. The effectiveness of L-MolGAN is experimentally evaluated. The size and connectivity of the molecular graphs generated with data from the ZINC-250k molecular dataset are confirmed using MolGAN as the baseline model. The model is then optimized for a quantitative estimate of drug-likeness (QED) to generate drug-like molecules. The experimental results indicate that the connectivity measure of generated molecular graphs improved by 1.96 compared with the baseline model at a larger maximum molecular size of 20 atoms. The molecules generated by L-MolGAN are evaluated in terms of multiple chemical properties, QED, synthetic accessibility, and log octanol–water partition coefficient, which are important in drug design. This result confirms that L-MolGAN can generate various drug-like molecules despite being optimized for a single property, i.e., QED. This method will contribute to the efficient discovery of new molecules of larger sizes than those being generated with the existing method.<br></p>


2021 ◽  
Author(s):  
Yutaka Tsujimoto ◽  
Satoru Hiwa ◽  
Yushi Nakamura ◽  
Yohei Oe ◽  
Tomoyuki Hiroyasu

<p>Deep generative models are used to generate arbitrary molecular structures with the desired chemical properties. MolGAN is a renowned molecular generation models that uses generative adversarial networks (GANs) and reinforcement learning to generate molecular graphs in one shot. MolGAN can effectively generate a small molecular graph with nine or fewer heavy atoms. However, the graphs tend to become disconnected as the molecular size increase. This poses a challenge to drug discovery and material design, where large molecules are potentially inclusive. This study develops an improved MolGAN for large molecule generation (L-MolGAN). In this model, the connectivity of molecular graphs is evaluated by a depth-first search during the model training process. When a disconnected molecular graph is generated, L-MolGAN rewards the graph a zero score. This procedure decreases the number of disconnected graphs, and consequently increases the number of connected molecular graphs. The effectiveness of L-MolGAN is experimentally evaluated. The size and connectivity of the molecular graphs generated with data from the ZINC-250k molecular dataset are confirmed using MolGAN as the baseline model. The model is then optimized for a quantitative estimate of drug-likeness (QED) to generate drug-like molecules. The experimental results indicate that the connectivity measure of generated molecular graphs improved by 1.96 compared with the baseline model at a larger maximum molecular size of 20 atoms. The molecules generated by L-MolGAN are evaluated in terms of multiple chemical properties, QED, synthetic accessibility, and log octanol–water partition coefficient, which are important in drug design. This result confirms that L-MolGAN can generate various drug-like molecules despite being optimized for a single property, i.e., QED. This method will contribute to the efficient discovery of new molecules of larger sizes than those being generated with the existing method.<br></p>


Author(s):  
S. Alyar ◽  
R. Khoeilar ◽  
A. Jahanbani

There are immense applications of graph theory in chemistry and in the study of molecular structures, and after that, it has been increasing exponentially. Molecular graphs have points (vertices) representing atoms and lines (edges) that represent bonds between atoms. In this paper, we study the molecular graph of porphyrin, propyl ether imine, zinc–porphyrin and poly dendrimers and analyzed its topological properties. For this purpose, we have computed topological indices, namely the Albertson index, the sigma index, the Nano-Zagreb index, the first and second hyper [Formula: see text]-indices of porphyrin, propyl ether imine, zinc–porphyrin and poly dendrimers.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Wei Gao ◽  
Weifan Wang ◽  
Muhammad Kamran Jamil ◽  
Mohammad Reza Farahani

It is found from the earlier studies that the structure-dependency of totalπ-electron energyEπheavily relies on the sum of squares of the vertex degrees of the molecular graph. Hence, it provides a measure of the branching of the carbon-atom skeleton. In recent years, the sum of squares of the vertex degrees of the molecular graph has been defined as forgotten topological index which reflects the structure-dependency of totalπ-electron energyEπand measures the physical-chemical properties of molecular structures. In this paper, in order to research the structure-dependency of totalπ-electron energyEπ, we present the forgotten topological index of some important molecular structures from mathematical standpoint. The formulations we obtained here use the approach of edge set dividing, and the conclusions can be applied in physics, chemical, material, and pharmaceutical engineering.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Weidong Zhao ◽  
K. Julietraja ◽  
P. Venugopal ◽  
Xiujun Zhang

Theoretical chemists are fascinated by polycyclic aromatic hydrocarbons (PAHs) because of their unique electromagnetic and other significant properties, such as superaromaticity. The study of PAHs has been steadily increasing because of their wide-ranging applications in several fields, like steel manufacturing, shale oil extraction, coal gasification, production of coke, tar distillation, and nanosciences. Topological indices (TIs) are numerical quantities that give a mathematical expression for the chemical structures. They are useful and cost-effective tools for predicting the properties of chemical compounds theoretically. Entropic network measures are a type of TIs with a broad array of applications, involving quantitative characterization of molecular structures and the investigation of some specific chemical properties of molecular graphs. Irregularity indices are numerical parameters that quantify the irregularity of a molecular graph and are used to predict some of the chemical properties, including boiling points, resistance, enthalpy of vaporization, entropy, melting points, and toxicity. This study aims to determine analytical expressions for the VDB entropy and irregularity-based indices in the rectangular Kekulene system.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Youngchun Kwon ◽  
Jiho Yoo ◽  
Youn-Suk Choi ◽  
Won-Joon Son ◽  
Dongseon Lee ◽  
...  

AbstractWith the advancements in deep learning, deep generative models combined with graph neural networks have been successfully employed for data-driven molecular graph generation. Early methods based on the non-autoregressive approach have been effective in generating molecular graphs quickly and efficiently but have suffered from low performance. In this paper, we present an improved learning method involving a graph variational autoencoder for efficient molecular graph generation in a non-autoregressive manner. We introduce three additional learning objectives and incorporate them into the training of the model: approximate graph matching, reinforcement learning, and auxiliary property prediction. We demonstrate the effectiveness of the proposed method by evaluating it for molecular graph generation tasks using QM9 and ZINC datasets. The model generates molecular graphs with high chemical validity and diversity compared with existing non-autoregressive methods. It can also conditionally generate molecular graphs satisfying various target conditions.


2020 ◽  
Author(s):  
Omar Mahmood ◽  
Elman Mansimov ◽  
Richard Bonneau ◽  
Kyunghyun Cho

De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design.<br>Here, we introduce a masked graph model which learns a distribution over graphs by capturing all possible conditional distributions over unobserved nodes and edges given observed ones. We train our masked graph model on existing molecular graphs and then sample novel molecular graphs from it by iteratively masking and replacing different parts of initialized graphs. We evaluate our approach on the QM9 and ChEMBL datasets using the distribution-learning benchmark from the GuacaMol framework. The benchmark contains five metrics: the validity, uniqueness, novelty, KL-divergence and Fr{\'e}chet ChemNet Distance scores, the last two of which are measures of the similarity of the generated samples to the training, validation and test distributions. We find that KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty scores. By varying generation initialization and the fraction of the graph masked and replaced at each generation step, we can increase the Fréchet score at the cost of novelty. <br>In this way, we show that our model offers transparent and tunable control of the trade-off between these metrics, a key point of control in design applications currently lacking in other approaches to molecular graph generation. Our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches. Finally, we observe that minimizing validation loss on the training task is a suitable proxy for improving generation quality, which shows the suitability of optimizing the training objective for improving generation.


2021 ◽  
Author(s):  
Omar Mahmood ◽  
Elman Mansimov ◽  
Richard Bonneau ◽  
Kyunghyun Cho

De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design.<br>Here, we introduce a masked graph model which learns a distribution over graphs by capturing all possible conditional distributions over unobserved nodes and edges given observed ones.<br>We train our masked graph model on existing molecular graphs and then sample novel molecular graphs from it by iteratively masking and replacing different parts of initialized graphs. <br>We evaluate our approach on the QM9 and ChEMBL datasets using the distribution-learning benchmark from the GuacaMol framework.<br>The benchmark contains five metrics: the validity, uniqueness, novelty, KL-divergence and Fréchet ChemNet Distance scores, the last two of which are measures of the similarity of the generated samples to the training, validation and test distributions. <br>We find that KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty scores. By varying generation initialization and the fraction of the graph masked and replaced at each generation step, we can increase the Fréchet score at the cost of novelty. <br>In this way, we show that our model offers transparent and tunable control of the trade-off between these metrics, a point of control currently lacking in other approaches to molecular graph generation.<br>We observe that our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches.<br>Finally, we show that our model can generate molecules with desired values of specified properties while maintaining physiochemical similarity to molecules from the training distribution.


2020 ◽  
Author(s):  
Omar Mahmood ◽  
Elman Mansimov ◽  
Richard Bonneau ◽  
Kyunghyun Cho

De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design.<br>Here, we introduce a masked graph model which learns a distribution over graphs by capturing all possible conditional distributions over unobserved nodes and edges given observed ones. We train our masked graph model on existing molecular graphs and then sample novel molecular graphs from it by iteratively masking and replacing different parts of initialized graphs. We evaluate our approach on the QM9 and ChEMBL datasets using the distribution-learning benchmark from the GuacaMol framework. The benchmark contains five metrics: the validity, uniqueness, novelty, KL-divergence and Fr{\'e}chet ChemNet Distance scores, the last two of which are measures of the similarity of the generated samples to the training, validation and test distributions. We find that KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty scores. By varying generation initialization and the fraction of the graph masked and replaced at each generation step, we can increase the Fréchet score at the cost of novelty. <br>In this way, we show that our model offers transparent and tunable control of the trade-off between these metrics, a key point of control in design applications currently lacking in other approaches to molecular graph generation. Our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches. Finally, we observe that minimizing validation loss on the training task is a suitable proxy for improving generation quality, which shows the suitability of optimizing the training objective for improving generation.


Sign in / Sign up

Export Citation Format

Share Document