scholarly journals AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks

2020 ◽  
Vol 21 (22) ◽  
pp. 8424
Author(s):  
Yongbeom Kwon ◽  
Woong-Hee Shin ◽  
Junsu Ko ◽  
Juyong Lee

Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. Therefore, many binding affinity prediction methods have been developed. In recent years, since deep learning technology has become powerful, it is also implemented to predict affinity. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3-D convolutional neural network layers. Our model was trained using the 3772 protein-ligand complexes from the refined set of the PDBbind-2016 database and tested using the core set of 285 complexes. The benchmark results show that the Pearson correlation coefficient between the predicted binding affinities by our model and the experimental data is 0.827, which is higher than the state-of-the-art binding affinity prediction scoring functions. Additionally, our method ranks the relative binding affinities of possible multiple binders of a protein quite accurately, comparable to the other scoring functions. Last, we measured which structural information is critical for predicting binding affinity and found that the complementarity between the protein and ligand is most important.

2020 ◽  
Author(s):  
Yongbeom Kwon ◽  
Woong-Hee Shin ◽  
Junsu Ko ◽  
Juyong Lee

Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our new model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3D convolutional neural network layers. Our model was trained using the 3740 protein-ligand complexes from the refined set of the PDBbind database and tested using the 270 complexes from the core set. The benchmark results show that the correlation coefficient between the predicted binding affinities by our model and the experimental data is higher than 0.72, which is comparable with the state-of-the-art binding affinity prediction methods. In addition, our method also ranks the relative binding affinities of possible multiple binders of a protein quite accurately. Last, we measured which structural information is critical for predicting binding affinity.


2020 ◽  
Author(s):  
Yongbeom Kwon ◽  
Woong-Hee Shin ◽  
Junsu Ko ◽  
Juyong Lee

Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our new model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3D convolutional neural network layers. Our model was trained using the 3740 protein-ligand complexes from the refined set of the PDBbind database and tested using the 270 complexes from the core set. The benchmark results show that the correlation coefficient between the predicted binding affinities by our model and the experimental data is higher than 0.72, which is comparable with the state-of-the-art binding affinity prediction methods. In addition, our method also ranks the relative binding affinities of possible multiple binders of a protein quite accurately. Last, we measured which structural information is critical for predicting binding affinity.


Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett M Morris

Abstract Motivation Machine learning scoring functions for protein–ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein–ligand complex, with limited information about the chemical or topological properties of the ligand itself. Results We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest (RF) combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.836, 0.780 and 0.821 on the PDBbind 2007, 2013 and 2016 core sets, respectively, compared to 0.790, 0.746 and 0.814 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a RF using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets. Availability and implementation Data and code to reproduce all the results are freely available at http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information about the chemical or topological properties of the ligand itself. We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.831, 0.785, and 0.821 on the PDBbind 2007, 2013, and 2016 core sets respectively, compared to 0.790, 0.737, and 0.797 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a Random Forest using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets.<br>


Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information about the chemical or topological properties of the ligand itself. We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.831, 0.785, and 0.821 on the PDBbind 2007, 2013, and 2016 core sets respectively, compared to 0.790, 0.737, and 0.797 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a Random Forest using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets.<br>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Sangmin Seo ◽  
Jonghwan Choi ◽  
Sanghyun Park ◽  
Jaegyoon Ahn

Abstract Background Accurate prediction of protein–ligand binding affinity is important for lowering the overall cost of drug discovery in structure-based drug design. For accurate predictions, many classical scoring functions and machine learning-based methods have been developed. However, these techniques tend to have limitations, mainly resulting from a lack of sufficient energy terms to describe the complex interactions between proteins and ligands. Recent deep-learning techniques can potentially solve this problem. However, the search for more efficient and appropriate deep-learning architectures and methods to represent protein–ligand complex is ongoing. Results In this study, we proposed a deep-neural network model to improve the prediction accuracy of protein–ligand complex binding affinity. The proposed model has two important features, descriptor embeddings with information on the local structures of a protein–ligand complex and an attention mechanism to highlight important descriptors for binding affinity prediction. The proposed model performed better than existing binding affinity prediction models on most benchmark datasets. Conclusions We confirmed that an attention mechanism can capture the binding sites in a protein–ligand complex to improve prediction performance. Our code is available at https://github.com/Blue1993/BAPA.


2021 ◽  
Author(s):  
Sangmin Seo ◽  
Jonghwan Choi ◽  
Sanghyun Park ◽  
Jaegyoon Ahn

AbstractAccurate prediction of protein-ligand binding affinity is important in that it can lower the overall cost of drug discovery in structure-based drug design. For more accurate prediction, many classical scoring functions and machine learning-based methods have been developed. However, these techniques tend to have limitations, mainly resulting from a lack of sufficient interactions energy terms to describe complex interactions between proteins and ligands. Recent deep-learning techniques show strong potential to solve this problem, but the search for more efficient and appropriate deep-learning architectures and methods to represent protein-ligand complexes continues. In this study, we proposed a deep-neural network for more accurate prediction of protein-ligand complex binding affinity. The proposed model has two important features, descriptor embeddings that contains embedded information about the local structures of a protein-ligand complex and an attention mechanism for highlighting important descriptors to binding affinity prediction. The proposed model showed better performance on most benchmark datasets than existing binding affinity prediction models. Moreover, we confirmed that an attention mechanism was able to capture binding sites in a protein-ligand complex and that it contributed to improvement in predictive performance. Our code is available at https://github.com/Blue1993/BAPA.Author summaryThe initial step in drug discovery is to identify drug candidates for a target protein using a scoring function. Existing scoring functions, however, lack the ability to accurately predict the binding affinity of protein-ligand complexes. In this study, we proposed a deep learning-based approach to extract patterns from the local structures of protein-ligand complexes and to highlight the important local structures via an attention mechanism. The proposed model showed good performance for various benchmark datasets compared to existing models.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jooyong Shim ◽  
Zhen-Yu Hong ◽  
Insuk Sohn ◽  
Changha Hwang

AbstractIdentifying novel drug–target interactions (DTIs) plays an important role in drug discovery. Most of the computational methods developed for predicting DTIs use binary classification, whose goal is to determine whether or not a drug–target (DT) pair interacts. However, it is more meaningful but also more challenging to predict the binding affinity that describes the strength of the interaction between a DT pair. If the binding affinity is not sufficiently large, such drug may not be useful. Therefore, the methods for predicting DT binding affinities are very valuable. The increase in novel public affinity data available in the DT-related databases enables advanced deep learning techniques to be used to predict binding affinities. In this paper, we propose a similarity-based model that applies 2-dimensional (2D) convolutional neural network (CNN) to the outer products between column vectors of two similarity matrices for the drugs and targets to predict DT binding affinities. To our best knowledge, this is the first application of 2D CNN in similarity-based DT binding affinity prediction. The validation results on multiple public datasets show that the proposed model is an effective approach for DT binding affinity prediction and can be quite helpful in drug development process.


Author(s):  
Norberto Sánchez-Cruz ◽  
José L Medina-Franco ◽  
Jordi Mestres ◽  
Xavier Barril

Abstract Motivation Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein–ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited. Results Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein–ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein−ligand atom-type pair counts that take into account each atom’s connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein–ligand affinities (pKd/pKi). The models were evaluated in terms of ‘scoring power’ on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power. Availability and implementation Data and code to reproduce all the results are freely available at https://github.com/DIFACQUIM/ECIF. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Xujun Zhang ◽  
Chao Shen ◽  
Zhe Wang ◽  
Gaoqi Weng ◽  
Qing Ye ◽  
...  

Abstract Virtual screening (VS) based on molecular docking has emerged as one of the mainstream technologies of drug discovery due to its low cost and high efficiency. However, the scoring functions (SFs) implemented in most docking programs are not always accurate enough and how to improve their prediction accuracy is still a big challenge. Here, we propose an integrated platform called ASFP, a web server for the development of customized SFs for structure-based VS. There are three main modules in ASFP: 1) the descriptor generation module that can generate up to 3437 descriptors for the modelling of protein-ligand interactions; 2) the AI-based SF construction module that can establish target-specific SFs based on the pre-generated descriptors through three machine learning (ML) techniques; 3) the online prediction module that provides some well-constructed target-specific SFs for VS and a generic SF for binding affinity prediction. Our methodology has been validated on several benchmark datasets. The target-specific SFs can achieve an average ROC AUC of 0.841 towards 32 targets and the generic SF can achieve the Pearson correlation coefficient of 0.81 on the PDBbind version 2016 core set. To sum up, the ASFP server is a powerful tool for structure-based VS and binding affinity prediction. Availability and Implementation: ASFP web server is freely available at http://cadd.zju.edu.cn/asfp/.


Sign in / Sign up

Export Citation Format

Share Document