GraphDTA: Predicting drug–target binding affinity with graph neural networks

Abstract The development of new drugs is costly, time consuming, and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug–target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity. We show that graph neural networks not only predict drug–target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug–target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. Availability of data and materials The proposed models are implemented in Python. Related data, pre-trained models, and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/10.5281/zenodo.3603523.

Download Full-text

GraphDTA: Predicting drug–target binding affinity with graph neural networks

10.1101/684662 ◽

2019 ◽

Cited By ~ 6

Author(s):

Thin Nguyen ◽

Hang Le ◽

Thomas P. Quinn ◽

Tri Nguyen ◽

Thuc Duy Le ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Binding Affinity ◽

Drug Target ◽

Interaction Strength ◽

Drug Repurposing ◽

Learning Models ◽

Link Type ◽

Target Binding ◽

Graph Neural Networks

AbstractThe development of new drugs is costly, time consuming, and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug--target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug--target affinity. We show that graph neural networks not only predict drug--target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug--target binding affinity prediction, and that representing drugs as graphs can lead to further improvements.Availability of data and materialsThe proposed models are implemented in Python. Related data, pre-trained models, and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/10.5281/[email protected]

Download Full-text

Graph convolutional networks: a comprehensive review

Computational Social Networks ◽

10.1186/s40649-019-0069-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 27

Author(s):

Si Zhang ◽

Hanghang Tong ◽

Jiejun Xu ◽

Ross Maciejewski

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Network Models ◽

Representation Learning ◽

Superior Performance ◽

Learning Models ◽

Convolutional Network ◽

Comprehensive Review ◽

Convolutional Networks ◽

Graph Neural Networks

Abstract Graphs naturally appear in numerous application domains, ranging from social analysis, bioinformatics to computer vision. The unique capability of graphs enables capturing the structural relations among data, and thus allows to harvest more insights compared to analyzing data in isolation. However, it is often very challenging to solve the learning problems on graphs, because (1) many types of data are not originally structured as graphs, such as images and text data, and (2) for graph-structured data, the underlying connectivity patterns are often complex and diverse. On the other hand, the representation learning has achieved great successes in many areas. Thereby, a potential solution is to learn the representation of graphs in a low-dimensional Euclidean space, such that the graph properties can be preserved. Although tremendous efforts have been made to address the graph representation learning problem, many of them still suffer from their shallow learning mechanisms. Deep learning models on graphs (e.g., graph neural networks) have recently emerged in machine learning and other related areas, and demonstrated the superior performance in various problems. In this survey, despite numerous types of graph neural networks, we conduct a comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models. First, we group the existing graph convolutional network models into two categories based on the types of convolutions and highlight some graph convolutional network models in details. Then, we categorize different graph convolutional networks according to the areas of their applications. Finally, we present several open challenges in this area and discuss potential directions for future research.

Download Full-text

Prediction of Drug-Target Binding Affinity by An Ensemble Learning System with Network Fusion Information

Current Bioinformatics ◽

10.2174/1574893616666210226114834 ◽

2021 ◽

Vol 16 ◽

Author(s):

Cheng Lin Zhang ◽

You Zhi Zhang ◽

Bing Wang ◽

Peng Chen

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Drug Target ◽

Integrated System ◽

Learning System ◽

New Drugs ◽

Subsequent Work ◽

Network Information ◽

Similarity Networks ◽

Target Binding

Background: Verifying interactions between drugs and targets is key to discover new drugs. Many computational methods have been developed to predict drug-target interactions and performed successfully, but challenges still exist in the field. Objective: We try to develop a machine learning method to predict drug-target affinity, which can determine the strength of the binding relationship between drug and target. Method: This paper proposes an integrated machine learning system for drug-target binding affinity prediction based on network fusion. First, multiple similarity networks representing drugs or targets are calculated. Second, multiple networks representing drugs (targets) are fused separately. Finally, the characteristic information of splicing drugs and targets was used for model construction and training. By integrating multiple similarity networks, the model fully embodies the complementarity of network information, and the most complete features of information can be obtained after the redundancy is removed. Results: Experimental results showed that our model obtained good results for DTI binding affinity. Conclusion: It is still challenging to predict drug-target affinity. This paper proposes to use an integrated system of fusion network information for addressing the issue, and the proposed method performs well, which can provide a certain data basis for the subsequent work. Website: https://www.dlearningapp.com/web/inmpba.htm

Download Full-text

DeepFusionDTA: drug-target binding affinity prediction with information fusion and hybrid deep-learning ensemble model

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3103966 ◽

2021 ◽

pp. 1-1

Author(s):

Yuqian Pu ◽

Jiawei Li ◽

Jijun Tang ◽

Fei Guo

Keyword(s):

Deep Learning ◽

Binding Affinity ◽

Information Fusion ◽

Drug Target ◽

Ensemble Model ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Target Binding

Download Full-text

TranDTA: Prediction Of Drug Target Binding Affinity Using Transformer Representations

10.1101/2021.09.30.462610 ◽

2021 ◽

Author(s):

Mahsa Saadat ◽

Armin Behjati ◽

Fatemeh Zare-Mirakabad ◽

Sajjad Gharaghani

Keyword(s):

Drug Discovery ◽

Binding Affinity ◽

Drug Target ◽

Feature Vector ◽

Structural Information ◽

Binary Classification ◽

Drug Repurposing ◽

Target Pair ◽

Target Binding ◽

State Of Art

Drug discovery is generally difficult, expensive and the success rate is low. One of the essential steps in the early stages of drug discovery and drug repurposing is identifying drug target interactions. Although several methods developed use binary classification to predict if the interaction between a drug and its target exists or not, it is more informative and challenging to predict the strength of the binding between a drug and its target. Binding affinity indicates the strength of drug-target pair interactions. In this regard, several computational methods have been developed to predict the drug-target binding affinity. With the advent of deep learning methods, the accuracy of binding affinity prediction is improving. However, the input representation of these models is very effective in the result. The early models only use the sequence of molecules and the latter models focus on the structure of them. Although the recent models predict binding affinity more accurate than the first ones, they need more data and resources for training. In this study, we present a method that uses a pre-trained transformer to represent the protein as model input. Although pretrained transformer extracts a feature vector of the protein sequence, they can learn structural information in layers and heads. So, the extracted feature vector by transformer includes the sequence and structural properties of protein. Therefore, our method can also be run without limitations on resources (memory, CPU and GPU). The results show that our model achieves a competitive performance with the state-of-art models. Data and trained model is available at http://bioinformatics.aut.ac.ir/TranDTA/ .

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as attentional gain – an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.

Download Full-text

Improving the Accuracy of Protein-Ligand Binding Affinity Prediction by Deep Learning Models: Benchmark and Model

10.26434/chemrxiv.9866912 ◽

2019 ◽

Author(s):

Mohammad Rezaei ◽

Yanjun Li ◽

Xiaolin Li ◽

Chenglong Li

Keyword(s):

Deep Learning ◽

Drug Design ◽

Binding Affinity ◽

Benchmark Dataset ◽

Rational Drug Design ◽

Learning Models ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Rational Drug

Introduction: The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results. Objectives: The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters. Methods: The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models. Results: The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model. Conclusions: The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.

Download Full-text

Deep Malaria Parasite Detection in Thin Blood Smear Microscopic Images

Applied Sciences ◽

10.3390/app11052284 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2284

Author(s):

Asma Maqsood ◽

Muhammad Shahid Farid ◽

Muhammad Hassan Khan ◽

Marcin Grzegorzek

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Red Blood Cells ◽

Convolutional Neural Networks ◽

Blood Cells ◽

Superior Performance ◽

Learning Models ◽

Infected Female ◽

The World ◽

Augmentation Techniques

Malaria is a disease activated by a type of microscopic parasite transmitted from infected female mosquito bites to humans. Malaria is a fatal disease that is endemic in many regions of the world. Quick diagnosis of this disease will be very valuable for patients, as traditional methods require tedious work for its detection. Recently, some automated methods have been proposed that exploit hand-crafted feature extraction techniques however, their accuracies are not reliable. Deep learning approaches modernize the world with their superior performance. Convolutional Neural Networks (CNN) are vastly scalable for image classification tasks that extract features through hidden layers of the model without any handcrafting. The detection of malaria-infected red blood cells from segmented microscopic blood images using convolutional neural networks can assist in quick diagnosis, and this will be useful for regions with fewer healthcare experts. The contributions of this paper are two-fold. First, we evaluate the performance of different existing deep learning models for efficient malaria detection. Second, we propose a customized CNN model that outperforms all observed deep learning models. It exploits the bilateral filtering and image augmentation techniques for highlighting features of red blood cells before training the model. Due to image augmentation techniques, the customized CNN model is generalized and avoids over-fitting. All experimental evaluations are performed on the benchmark NIH Malaria Dataset, and the results reveal that the proposed algorithm is 96.82% accurate in detecting malaria from the microscopic blood smears.

Download Full-text

GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues

Nucleic Acids Research ◽

10.1093/nar/gkab044 ◽

2021 ◽

Author(s):

Ying Xia ◽

Chun-Qiu Xia ◽

Xiaoyong Pan ◽

Hong-Bin Shen

Keyword(s):

Neural Networks ◽

Nucleic Acid ◽

Biological Activities ◽

New Drugs ◽

Superior Performance ◽

Nucleic Acid Binding ◽

Hierarchical Graph ◽

Binding Residue ◽

Binding Residues ◽

Graph Neural Networks

Abstract Knowledge of the interactions between proteins and nucleic acids is the basis of understanding various biological activities and designing new drugs. How to accurately identify the nucleic-acid-binding residues remains a challenging task. In this paper, we propose an accurate predictor, GraphBind, for identifying nucleic-acid-binding residues on proteins based on an end-to-end graph neural network. Considering that binding sites often behave in highly conservative patterns on local tertiary structures, we first construct graphs based on the structural contexts of target residues and their spatial neighborhood. Then, hierarchical graph neural networks (HGNNs) are used to embed the latent local patterns of structural and bio-physicochemical characteristics for binding residue recognition. We comprehensively evaluate GraphBind on DNA/RNA benchmark datasets. The results demonstrate the superior performance of GraphBind than state-of-the-art methods. Moreover, GraphBind is extended to other ligand-binding residue prediction to verify its generalization capability. Web server of GraphBind is freely available at http://www.csbio.sjtu.edu.cn/bioinf/GraphBind/.

Download Full-text

Explainable Graph Neural Networks for Organic Cages

10.26434/chemrxiv-2021-zhcb1 ◽

2021 ◽

Author(s):

Qi Yuan ◽

Filip Szczypiński ◽

Kim Jelfs

Keyword(s):

Neural Networks ◽

Material Science ◽

Predictive Power ◽

Internal Cavity ◽

Learning Models ◽

Molecular Fragments ◽

Molecular Separations ◽

Graph Neural Networks ◽

Structural Insights ◽

Machine Learning Models

The development of accurate and explicable machine learning models to predict the properties of topologically complex systems is a challenge in material science. Porous organic cages, a class of polycyclic molecular materials, have potential application in molecular separations, catalysis and encapsulation. For most applications of porous organic cages, having a permanent internal cavity in the absence of solvent, a property termed “shape persistency” is critical. Here, we report the development of Graph Neural Networks (GNNs) to predict the shape persistence of organic cages. Graph neural networks are a class of neural networks where the data, in our case that of organic cages, are represented by graphs. The performance of the GNN models was measured against a previously reported computational database of organic cages formed through a range of [4+6] reactions with a variety of reaction chemistries. The reported GNNs have an improved prediction accuracy and transferability compared to random forest predictions. Apart from the improvement in predictive power, we explored the explicability of the GNNs by computing the integrated gradient of the GNN input. The contribution of monomers and molecular fragments to the shape persistence of the organic cages could be quantitatively evaluated with integrated gradient. With the added explicability of the GNNs, it is possible not only to accurately predict the property of organic materials, but also to interpret the predictions of the deep learning models and provide structural insights to the discovery of future materials.

Download Full-text