DeepDTAF: a deep learning method to predict protein–ligand binding affinity

Author(s):  
Kaili Wang ◽  
Renyi Zhou ◽  
Yaohang Li ◽  
Min Li

Abstract Biomolecular recognition between ligand and protein plays an essential role in drug discovery and development. However, it is extremely time and resource consuming to determine the protein–ligand binding affinity by experiments. At present, many computational methods have been proposed to predict binding affinity, most of which usually require protein 3D structures that are not often available. Therefore, new methods that can fully take advantage of sequence-level features are greatly needed to predict protein–ligand binding affinity and accelerate the drug discovery process. We developed a novel deep learning approach, named DeepDTAF, to predict the protein–ligand binding affinity. DeepDTAF was constructed by integrating local and global contextual features. More specifically, the protein-binding pocket, which possesses some special properties for directly binding the ligand, was firstly used as the local input feature for protein–ligand binding affinity prediction. Furthermore, dilated convolution was used to capture multiscale long-range interactions. We compared DeepDTAF with the recent state-of-art methods and analyzed the effectiveness of different parts of our model, the significant accuracy improvement showed that DeepDTAF was a reliable tool for affinity prediction. The resource codes and data are available at https: //github.com/KailiWang1/DeepDTAF.

2018 ◽  
Vol 34 (21) ◽  
pp. 3666-3674 ◽  
Author(s):  
Marta M Stepniewska-Dziubinska ◽  
Piotr Zielenkiewicz ◽  
Pawel Siedlecki

2021 ◽  
Author(s):  
Bomin Wei ◽  
Xiang Gong

AbstractThe substantial cost of new drug research and development has consistently posed a huge burden and tremendous challenge for both pharmaceutical companies and patients. In order to lower the expenditure and development failure rate, repurposing existing and approved drugs and identifying novel interactions between the drug molecules and the target proteins based on computational methods have gained growing attention. Here, we propose the DeepPLA, a novel deep learning-based model that combines ResNet-based 1D CNN and biLSTM, to establish an end-to-end network for protein-ligand binding affinity prediction. We first apply pre-trained embedding methods to encode the raw drug molecular SMILES strings and target protein sequences into dense vector representations. The dense vector representations separately go through ResNet-based 1D CNN modules to derive features. The extracted feature vectors are concatenated and further fed into the biLSTM network after average pooling operation, followed by the MLP module to finally predict binding affinity. We used BindingDB dataset for training and evaluating our DeepPLA model. The result shows that the DeepPLA model reaches a good performance for the protein-ligand binding affinity prediction in terms of R, RMSE, MAE, R2 and MSE with 0.89, 0.68, 0.50, 0.79 and 0.46 on the training set; and scores 0.84, 0.80, 0.60, 0.71 and 0.64 on the independent testing set, respectively. This result suggests the high accuracy of the DeepPLA prediction performance, as well as its high capability in generalization, demonstrating that the DeepPLA can be the potential upgrade to pinpoint new drug-target interactions to find better destinations for proven drugs.


Author(s):  
A S Rifaioglu ◽  
R Cetin Atalay ◽  
D Cansen Kahraman ◽  
T Doğan ◽  
M Martin ◽  
...  

Abstract Motivation Identification of interactions between bioactive small molecules and target proteins is crucial for novel drug discovery, drug repurposing and uncovering off-target effects. Due to the tremendous size of the chemical space, experimental bioactivity screening efforts require the aid of computational approaches. Although deep learning models have been successful in predicting bioactive compounds, effective and comprehensive featurization of proteins, to be given as input to deep neural networks, remains a challenge. Results Here, we present a novel protein featurization approach to be used in deep learning-based compound–target protein binding affinity prediction. In the proposed method, multiple types of protein features such as sequence, structural, evolutionary and physicochemical properties are incorporated within multiple 2D vectors, which is then fed to state-of-the-art pairwise input hybrid deep neural networks to predict the real-valued compound–target protein interactions. The method adopts the proteochemometric approach, where both the compound and target protein features are used at the input level to model their interaction. The whole system is called MDeePred and it is a new method to be used for the purposes of computational drug discovery and repositioning. We evaluated MDeePred on well-known benchmark datasets and compared its performance with the state-of-the-art methods. We also performed in vitro comparative analysis of MDeePred predictions with selected kinase inhibitors’ action on cancer cells. MDeePred is a scalable method with sufficiently high predictive performance. The featurization approach proposed here can also be utilized for other protein-related predictive tasks. Availability and implementation The source code, datasets, additional information and user instructions of MDeePred are available at https://github.com/cansyl/MDeePred. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Mohammad Rezaei ◽  
Yanjun Li ◽  
Xiaolin Li ◽  
Chenglong Li

<b>Introduction:</b> The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results.<br><b>Objectives:</b> The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters.<br><b>Methods:</b> The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models.<br><b>Results: </b>The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model.<br><b>Conclusions:</b> The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.


2021 ◽  
Vol 15 ◽  
pp. 117793222110303
Author(s):  
Asad Ahmed ◽  
Bhavika Mam ◽  
Ramanathan Sowdhamini

Protein-ligand binding prediction has extensive biological significance. Binding affinity helps in understanding the degree of protein-ligand interactions and is a useful measure in drug design. Protein-ligand docking using virtual screening and molecular dynamic simulations are required to predict the binding affinity of a ligand to its cognate receptor. Performing such analyses to cover the entire chemical space of small molecules requires intense computational power. Recent developments using deep learning have enabled us to make sense of massive amounts of complex data sets where the ability of the model to “learn” intrinsic patterns in a complex plane of data is the strength of the approach. Here, we have incorporated convolutional neural networks to find spatial relationships among data to help us predict affinity of binding of proteins in whole superfamilies toward a diverse set of ligands without the need of a docked pose or complex as user input. The models were trained and validated using a stringent methodology for feature extraction. Our model performs better in comparison to some existing methods used widely and is suitable for predictions on high-resolution protein crystal (⩽2.5 Å) and nonpeptide ligand as individual inputs. Our approach to network construction and training on protein-ligand data set prepared in-house has yielded significant insights. We have also tested DEELIG on few COVID-19 main protease-inhibitor complexes relevant to the current public health scenario. DEELIG-based predictions can be incorporated in existing databases including RSCB PDB, PDBMoad, and PDBbind in filling missing binding affinity data for protein-ligand complexes.


2020 ◽  
Author(s):  
Paul Francoeur ◽  
Tomohide Masuda ◽  
David R. Koes

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.


2021 ◽  
Author(s):  
Tai-Sung Lee ◽  
Hsu-Chun Tsai ◽  
Abir Ganguly ◽  
Timothy J Giese ◽  
Darrin M. York

Recent concurrent advances in methodology development, computer hardware and simulation software has transformed our ability to make practical, quantitative predictions of relative ligand binding affinities to guide rational drug design. In the past, these calculations have been hampered by the lack of affordable software with highly efficient implementations of state-of-the-art methods on specialized hardware such as graphical processing units, combined with the paucity of available workflows to streamline throughput for real-world industry applications. Herein we discuss recent methodology development, GPU-accelerated implementation, and workflow creation for alchemical free energy simulation methods in the AMBER Drug Discovery Boost (AMBER-DD Boost) package available as a patch to AMBER20. Among the methodological advances are 1) new methods for the treatment of softcore potentials that overcome long standing end-point catastrophe and softcore imbalance problems and enable single-step alchemical transformations between ligands, and 2) new adaptive enhanced sampling methods in the "alchemical" (or "λ") dimension to accelerate convergence and obtain high precision ligand binding affinity predictions, 3) robust network-wide analysis methods that include cycle closure and reference constraints and restraints, and 4) practical workflows that enable streamlined calculations on large datasets to be performed. Benchmark calculations on various systems demonstrate that these tools deliver an outstanding combination of accuracy and performance, resulting in reliable high-throughput binding affinity predictions at affordable cost.<br>


Sign in / Sign up

Export Citation Format

Share Document