scholarly journals Compressed graph representation for scalable molecular graph generation

2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Youngchun Kwon ◽  
Dongseon Lee ◽  
Youn-Suk Choi ◽  
Kyoham Shin ◽  
Seokho Kang

Abstract Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular graph generation to large molecules with many heavy atoms. In this study, we present a molecular graph compression method to alleviate the complexity while maintaining the capability of generating chemically valid and diverse molecular graphs. We designate six small substructural patterns that are prevalent between two atoms in real-world molecules. These relevant substructures in a molecular graph are then converted to edges by regarding them as additional edge features along with the bond types. This reduces the number of nodes significantly without any information loss. Consequently, a generative model can be constructed in a more efficient and scalable manner with large molecules on a compressed graph representation. We demonstrate the effectiveness of the proposed method for molecules with up to 88 heavy atoms using the GuacaMol benchmark.

2020 ◽  
Author(s):  
Artur Schweidtmann ◽  
Jan Rittig ◽  
Andrea König ◽  
Martin Grohe ◽  
Alexander Mitsos ◽  
...  

<div>Prediction of combustion-related properties of (oxygenated) hydrocarbons is an important and challenging task for which quantitative structure-property relationship (QSPR) models are frequently employed. Recently, a machine learning method, graph neural networks (GNNs), has shown promising results for the prediction of structure-property relationships. GNNs utilize a graph representation of molecules, where atoms correspond to nodes and bonds to edges containing information about the molecular structure. More specifically, GNNs learn physico-chemical properties as a function of the molecular graph in a supervised learning setup using a backpropagation algorithm. This end-to-end learning approach eliminates the need for selection of molecular descriptors or structural groups, as it learns optimal fingerprints through graph convolutions and maps the fingerprints to the physico-chemical properties by deep learning. We develop GNN models for predicting three fuel ignition quality indicators, i.e., the derived cetane number (DCN), the research octane number (RON), and the motor octane number (MON), of oxygenated and non-oxygenated hydrocarbons. In light of limited experimental data in the order of hundreds, we propose a combination of multi-task learning, transfer learning, and ensemble learning. The results show competitive performance of the proposed GNN approach compared to state-of-the-art QSPR models making it a promising field for future research. The prediction tool is available via a web front-end at www.avt.rwth-aachen.de/gnn.</div>


2021 ◽  
Author(s):  
Weijuan Cao ◽  
Trevor Robinson ◽  
Hua Yang ◽  
Flavien Boussuge ◽  
Andrew Colligan ◽  
...  

Author(s):  
Yujie Chen ◽  
Tengfei Ma ◽  
Xixi Yang ◽  
Jianmin Wang ◽  
Bosheng Song ◽  
...  

Abstract Motivation Adverse drug–drug interactions (DDIs) are crucial for drug research and mainly cause morbidity and mortality. Thus, the identification of potential DDIs is essential for doctors, patients and the society. Existing traditional machine learning models rely heavily on handcraft features and lack generalization. Recently, the deep learning approaches that can automatically learn drug features from the molecular graph or drug-related network have improved the ability of computational models to predict unknown DDIs. However, previous works utilized large labeled data and merely considered the structure or sequence information of drugs without considering the relations or topological information between drug and other biomedical objects (e.g. gene, disease and pathway), or considered knowledge graph (KG) without considering the information from the drug molecular structure. Results Accordingly, to effectively explore the joint effect of drug molecular structure and semantic information of drugs in knowledge graph for DDI prediction, we propose a multi-scale feature fusion deep learning model named MUFFIN. MUFFIN can jointly learn the drug representation based on both the drug-self structure information and the KG with rich bio-medical information. In MUFFIN, we designed a bi-level cross strategy that includes cross- and scalar-level components to fuse multi-modal features well. MUFFIN can alleviate the restriction of limited labeled data on deep learning models by crossing the features learned from large-scale KG and drug molecular graph. We evaluated our approach on three datasets and three different tasks including binary-class, multi-class and multi-label DDI prediction tasks. The results showed that MUFFIN outperformed other state-of-the-art baselines. Availability and implementation The source code and data are available at https://github.com/xzenglab/MUFFIN.


Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1062 ◽  
Author(s):  
Yuhang Dong ◽  
W. David Pan ◽  
Dongsheng Wu

Malaria is a severe public health problem worldwide, with some developing countries being most affected. Reliable remote diagnosis of malaria infection will benefit from efficient compression of high-resolution microscopic images. This paper addresses a lossless compression of malaria-infected red blood cell images using deep learning. Specifically, we investigate a practical approach where images are first classified before being compressed using stacked autoencoders. We provide probabilistic analysis on the impact of misclassification rates on compression performance in terms of the information-theoretic measure of entropy. We then use malaria infection image datasets to evaluate the relations between misclassification rates and actually obtainable compressed bit rates using Golomb–Rice codes. Simulation results show that the joint pattern classification/compression method provides more efficient compression than several mainstream lossless compression techniques, such as JPEG2000, JPEG-LS, CALIC, and WebP, by exploiting common features extracted by deep learning on large datasets. This study provides new insight into the interplay between classification accuracy and compression bitrates. The proposed compression method can find useful telemedicine applications where efficient storage and rapid transfer of large image datasets is desirable.


2002 ◽  
Vol 57 (1-2) ◽  
pp. 49-51
Author(s):  
Miranca Fischermann ◽  
Ivan Gutman ◽  
Arne Hoffmann ◽  
Dieter Rautenbach ◽  
Dušica Vidovića ◽  
...  

A variety of molecular-graph-based structure-descriptors were proposed, in particular the Wiener index W. the largest graph eigenvalue λ1, the connectivity index X, the graph energy E and the Hosoya index Z, capable of measuring the branching of the carbon-atom skeleton of organic compounds, and therefore suitable for describing several of their physico-chemical properties. We now determine the structure of the chemical trees (= the graph representation of acyclic saturated hydrocarbons) that are extremal with respect to W , λ1, E, and Z. whereas the analogous problem for X was solved earlier. Among chemical trees with 5. 6, 7, and 3k + 2 vertices, k = 2,3,..., one and the same tree has maximum λ1 and minimum W, E, Z. Among chemical trees with 3k and 3k +1 vertices, k = 3,4...., one tree has minimum 11 and maximum λ1 and another minimum E and Z .


2021 ◽  
Author(s):  
Xuhan Liu ◽  
Kai Ye ◽  
Herman W. T. van Vlijmen ◽  
Adriaan P. IJzerman ◽  
Gerard J. P. van Westen

Due to the large drug-like chemical space available to search for feasible drug-like molecules, rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified. With the rapid growth of the application of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work, we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives similar to other known methods and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. In this work, the Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules we proposed a novel positional encoding for each atom and bond based on an adjacency matrix to extend the architecture of the Transformer. Each molecule was generated by growing and connecting procedures for the fragments in the given scaffold that were unified into one model. Moreover, we trained this generator under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, our proposed method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results demonstrated the effectiveness of our method in that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.


2021 ◽  
Author(s):  
Yingheng Wang ◽  
Yaosen Min ◽  
Erzhuo Shao ◽  
Ji Wu

ABSTRACTLearning generalizable, transferable, and robust representations for molecule data has always been a challenge. The recent success of contrastive learning (CL) for self-supervised graph representation learning provides a novel perspective to learn molecule representations. The most prevailing graph CL framework is to maximize the agreement of representations in different augmented graph views. However, existing graph CL frameworks usually adopt stochastic augmentations or schemes according to pre-defined rules on the input graph to obtain different graph views in various scales (e.g. node, edge, and subgraph), which may destroy topological semantemes and domain prior in molecule data, leading to suboptimal performance. Therefore, designing parameterized, learnable, and explainable augmentation is quite necessary for molecular graph contrastive learning. A well-designed parameterized augmentation scheme can preserve chemically meaningful structural information and intrinsically essential attributes for molecule graphs, which helps to learn representations that are insensitive to perturbation on unimportant atoms and bonds. In this paper, we propose a novel Molecular Graph Contrastive Learning with Parameterized Explainable Augmentations, MolCLE for brevity, that self-adaptively incorporates chemically significative information from both topological and semantic aspects of molecular graphs. Specifically, we apply deep neural networks to parameterize the augmentation process for both the molecular graph topology and atom attributes, to highlight contributive molecular substructures and recognize underlying chemical semantemes. Comprehensive experiments on a variety of real-world datasets demonstrate that our proposed method consistently outperforms compared baselines, which verifies the effectiveness of the proposed framework. Detailedly, our self-supervised MolCLE model surpasses many supervised counterparts, and meanwhile only uses hundreds of thousands of parameters to achieve comparative results against the state-of-the-art baseline, which has tens of millions of parameters. We also provide detailed case studies to validate the explainability of augmented graph views.CCS CONCEPTS• Mathematics of computing → Graph algorithms; • Applied computing → Bioinformatics; • Computing methodologies → Neural networks; Unsupervised learning.


Sign in / Sign up

Export Citation Format

Share Document