scholarly journals GraphDTA: Predicting drug–target binding affinity with graph neural networks

2019 ◽  
Author(s):  
Thin Nguyen ◽  
Hang Le ◽  
Thomas P. Quinn ◽  
Tri Nguyen ◽  
Thuc Duy Le ◽  
...  

AbstractThe development of new drugs is costly, time consuming, and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug--target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug--target affinity. We show that graph neural networks not only predict drug--target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug--target binding affinity prediction, and that representing drugs as graphs can lead to further improvements.Availability of data and materialsThe proposed models are implemented in Python. Related data, pre-trained models, and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/10.5281/[email protected]

Author(s):  
Thin Nguyen ◽  
Hang Le ◽  
Thomas P Quinn ◽  
Tri Nguyen ◽  
Thuc Duy Le ◽  
...  

Abstract   The development of new drugs is costly, time consuming, and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug–target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity. We show that graph neural networks not only predict drug–target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug–target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. Availability of data and materials The proposed models are implemented in Python. Related data, pre-trained models, and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/10.5281/zenodo.3603523.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Si Zhang ◽  
Hanghang Tong ◽  
Jiejun Xu ◽  
Ross Maciejewski

Abstract Graphs naturally appear in numerous application domains, ranging from social analysis, bioinformatics to computer vision. The unique capability of graphs enables capturing the structural relations among data, and thus allows to harvest more insights compared to analyzing data in isolation. However, it is often very challenging to solve the learning problems on graphs, because (1) many types of data are not originally structured as graphs, such as images and text data, and (2) for graph-structured data, the underlying connectivity patterns are often complex and diverse. On the other hand, the representation learning has achieved great successes in many areas. Thereby, a potential solution is to learn the representation of graphs in a low-dimensional Euclidean space, such that the graph properties can be preserved. Although tremendous efforts have been made to address the graph representation learning problem, many of them still suffer from their shallow learning mechanisms. Deep learning models on graphs (e.g., graph neural networks) have recently emerged in machine learning and other related areas, and demonstrated the superior performance in various problems. In this survey, despite numerous types of graph neural networks, we conduct a comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models. First, we group the existing graph convolutional network models into two categories based on the types of convolutions and highlight some graph convolutional network models in details. Then, we categorize different graph convolutional networks according to the areas of their applications. Finally, we present several open challenges in this area and discuss potential directions for future research.


2021 ◽  
Author(s):  
Mahsa Saadat ◽  
Armin Behjati ◽  
Fatemeh Zare-Mirakabad ◽  
Sajjad Gharaghani

Drug discovery is generally difficult, expensive and the success rate is low. One of the essential steps in the early stages of drug discovery and drug repurposing is identifying drug target interactions. Although several methods developed use binary classification to predict if the interaction between a drug and its target exists or not, it is more informative and challenging to predict the strength of the binding between a drug and its target. Binding affinity indicates the strength of drug-target pair interactions. In this regard, several computational methods have been developed to predict the drug-target binding affinity. With the advent of deep learning methods, the accuracy of binding affinity prediction is improving. However, the input representation of these models is very effective in the result. The early models only use the sequence of molecules and the latter models focus on the structure of them. Although the recent models predict binding affinity more accurate than the first ones, they need more data and resources for training. In this study, we present a method that uses a pre-trained transformer to represent the protein as model input. Although pretrained transformer extracts a feature vector of the protein sequence, they can learn structural information in layers and heads. So, the extracted feature vector by transformer includes the sequence and structural properties of protein. Therefore, our method can also be run without limitations on resources (memory, CPU and GPU). The results show that our model achieves a competitive performance with the state-of-art models. Data and trained model is available at http://bioinformatics.aut.ac.ir/TranDTA/ .


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


2019 ◽  
Author(s):  
Mohammad Rezaei ◽  
Yanjun Li ◽  
Xiaolin Li ◽  
Chenglong Li

<b>Introduction:</b> The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results.<br><b>Objectives:</b> The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters.<br><b>Methods:</b> The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models.<br><b>Results: </b>The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model.<br><b>Conclusions:</b> The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.


2021 ◽  
Vol 11 (5) ◽  
pp. 2284
Author(s):  
Asma Maqsood ◽  
Muhammad Shahid Farid ◽  
Muhammad Hassan Khan ◽  
Marcin Grzegorzek

Malaria is a disease activated by a type of microscopic parasite transmitted from infected female mosquito bites to humans. Malaria is a fatal disease that is endemic in many regions of the world. Quick diagnosis of this disease will be very valuable for patients, as traditional methods require tedious work for its detection. Recently, some automated methods have been proposed that exploit hand-crafted feature extraction techniques however, their accuracies are not reliable. Deep learning approaches modernize the world with their superior performance. Convolutional Neural Networks (CNN) are vastly scalable for image classification tasks that extract features through hidden layers of the model without any handcrafting. The detection of malaria-infected red blood cells from segmented microscopic blood images using convolutional neural networks can assist in quick diagnosis, and this will be useful for regions with fewer healthcare experts. The contributions of this paper are two-fold. First, we evaluate the performance of different existing deep learning models for efficient malaria detection. Second, we propose a customized CNN model that outperforms all observed deep learning models. It exploits the bilateral filtering and image augmentation techniques for highlighting features of red blood cells before training the model. Due to image augmentation techniques, the customized CNN model is generalized and avoids over-fitting. All experimental evaluations are performed on the benchmark NIH Malaria Dataset, and the results reveal that the proposed algorithm is 96.82% accurate in detecting malaria from the microscopic blood smears.


2021 ◽  
Author(s):  
Qi Yuan ◽  
Filip Szczypiński ◽  
Kim Jelfs

The development of accurate and explicable machine learning models to predict the properties of topologically complex systems is a challenge in material science. Porous organic cages, a class of polycyclic molecular materials, have potential application in molecular separations, catalysis and encapsulation. For most applications of porous organic cages, having a permanent internal cavity in the absence of solvent, a property termed “shape persistency” is critical. Here, we report the development of Graph Neural Networks (GNNs) to predict the shape persistence of organic cages. Graph neural networks are a class of neural networks where the data, in our case that of organic cages, are represented by graphs. The performance of the GNN models was measured against a previously reported computational database of organic cages formed through a range of [4+6] reactions with a variety of reaction chemistries. The reported GNNs have an improved prediction accuracy and transferability compared to random forest predictions. Apart from the improvement in predictive power, we explored the explicability of the GNNs by computing the integrated gradient of the GNN input. The contribution of monomers and molecular fragments to the shape persistence of the organic cages could be quantitatively evaluated with integrated gradient. With the added explicability of the GNNs, it is possible not only to accurately predict the property of organic materials, but also to interpret the predictions of the deep learning models and provide structural insights to the discovery of future materials.


2022 ◽  
Vol 6 (1) ◽  
Author(s):  
Marco Rossi ◽  
Sofia Vallecorsa

AbstractIn this work, we investigate different machine learning-based strategies for denoising raw simulation data from the ProtoDUNE experiment. The ProtoDUNE detector is hosted by CERN and it aims to test and calibrate the technologies for DUNE, a forthcoming experiment in neutrino physics. The reconstruction workchain consists of converting digital detector signals into physical high-level quantities. We address the first step in reconstruction, namely raw data denoising, leveraging deep learning algorithms. We design two architectures based on graph neural networks, aiming to enhance the receptive field of basic convolutional neural networks. We benchmark this approach against traditional algorithms implemented by the DUNE collaboration. We test the capabilities of graph neural network hardware accelerator setups to speed up training and inference processes.


2021 ◽  
Author(s):  
Victor Fung ◽  
Jiaxin Zhang ◽  
Eric Juarez ◽  
Bobby Sumpter

Graph neural networks (GNNs) have received intense interest as a rapidly expanding class of machine learning models remarkably well-suited for materials applications. To date, a number of successful GNNs have been proposed and demonstrated for systems ranging from crystal stability to electronic property prediction and to surface chemistry and heterogeneous catalysis. However, a consistent benchmark of these models remains lacking, hindering the development and consistent evaluation of new models in the materials field. Here, we present a workflow and testing platform, MatDeepLearn, for quickly and reproducibly assessing and comparing GNNs and other machine learning models. We use this platform to optimize and evaluate a selection of top performing GNNs on several representative datasets in computational materials chemistry. From our investigations we note the importance of hyperparameter selection and find roughly similar performances for the top models once optimized. We identify several strengths in GNNs over conventional models in cases with compositionally diverse datasets and in its overall flexibility with respect to inputs, due to learned rather than defined representations. Meanwhile several weaknesses of GNNs are also observed including high data requirements, and suggestions for further improvement for applications in materials chemistry are proposed.


Sign in / Sign up

Export Citation Format

Share Document