Scale-Aware Graph-Based Machine Learning for Accurate Molecular Property Prediction

Author(s):  
Gyoung S. Na ◽  
Hyun Woo Kim ◽  
Hyunju Chang
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Dong Chen ◽  
Kaifu Gao ◽  
Duc Duy Nguyen ◽  
Xin Chen ◽  
Yi Jiang ◽  
...  

AbstractThe ability of molecular property prediction is of great significance to drug discovery, human health, and environmental protection. Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Although some machine learning models, such as bidirectional encoder from transformer, can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy, it neglects three-dimensional (3D) stereochemical information. Algebraic graph, specifically, element-specific multiscale weighted colored algebraic graph, embeds complementary 3D molecular information into graph invariants. We propose an algebraic graph-assisted bidirectional transformer (AGBT) framework by fusing representations generated by algebraic graph and bidirectional transformer, as well as a variety of machine learning algorithms, including decision trees, multitask learning, and deep neural networks. We validate the proposed AGBT framework on eight molecular datasets, involving quantitative toxicity, physical chemistry, and physiology datasets. Extensive numerical experiments have shown that AGBT is a state-of-the-art framework for molecular property prediction.


2021 ◽  
Author(s):  
Dan Clarke ◽  
Martijn Blaauw ◽  
Jaydip Guha ◽  
Altay Sansal ◽  
Muhlis Unaldi ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Juncai Li ◽  
Xiaofei Jiang

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.


2021 ◽  
Author(s):  
Jieun Choi ◽  
Juyong Lee

In this work, we propose a novel drug-like molecular design workflow by combining an efficient global molecular property optimization, protein-ligand molecular docking, and machine learning. Computational drug design algorithms aim to find novel molecules satisfying various drug-like properties and have a strong binding affinity between a protein and a ligand. To accomplish this goal, various computational molecular generation methods have been developed with recent advances in deep learning and the increase of biological data. However, most existing methods heavily depend on experimental activity data, which are not available for many targets. Thus, when the number of available activity data is limited, protein-ligand docking calculations should be used. However, performing a docking calculation during molecular generation on the fly requires considerable computational resources. To address this problem, we used machine-learning models predicting docking energy to accelerate the molecular generation process. We combined this ML-assisted docking score prediction model with the efficient global molecular property optimization approach, MolFinder. We call this design approach V-dock. Using the V-dock approach, we quickly generated many molecules with high docking scores for a target protein and desirable drug-like and bespoke properties, such as similarity to a reference molecule.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Withnall ◽  
E. Lindelöf ◽  
O. Engkvist ◽  
H. Chen

AbstractNeural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense to apply these techniques to improve molecular property prediction in the field of cheminformatics. We introduce Attention and Edge Memory schemes to the existing message passing neural network framework, and benchmark our approaches against eight different physical–chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 127968-127968
Author(s):  
Shuang Wang ◽  
Zhen Li ◽  
Shugang Zhang ◽  
Mingjian Jiang ◽  
Xiaofeng Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document