scholarly journals OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity Based on Residue-Atom Contacting Shells

2021 ◽  
Vol 9 ◽  
Author(s):  
Zechen Wang ◽  
Liangzhen Zheng ◽  
Yang Liu ◽  
Yuanyuan Qu ◽  
Yong-Qiang Li ◽  
...  

One key task in virtual screening is to accurately predict the binding affinity (△G) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict △G. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.

2021 ◽  
Vol 22 (8) ◽  
pp. 4023
Author(s):  
Huimin Shen ◽  
Youzhi Zhang ◽  
Chunhou Zheng ◽  
Bing Wang ◽  
Peng Chen

Accurate prediction of binding affinity between protein and ligand is a very important step in the field of drug discovery. Although there are many methods based on different assumptions and rules do exist, prediction performance of protein–ligand binding affinity is not satisfactory so far. This paper proposes a new cascade graph-based convolutional neural network architecture by dealing with non-Euclidean irregular data. We represent the molecule as a graph, and use a simple linear transformation to deal with the sparsity problem of the one-hot encoding of original data. The first stage adopts ARMA graph convolutional neural network to learn the characteristics of atomic space in the protein–ligand complex. In the second stage, one variant of the MPNN graph convolutional neural network is introduced with chemical bond information and interactive atomic features. Finally, the architecture passes through the global add pool and the fully connected layer, and outputs a constant value as the predicted binding affinity. Experiments on the PDBbind v2016 data set showed that our method is better than most of the current methods. Our method is also comparable to the state-of-the-art method on the data set, and is more intuitive and simple.


2020 ◽  
Author(s):  
E. Prabhu Raman ◽  
Thomas J. Paul ◽  
Ryan L. Hayes ◽  
Charles L. Brooks III

<p>Accurate predictions of changes to protein-ligand binding affinity in response to chemical modifications are of utility in small molecule lead optimization. Relative free energy perturbation (FEP) approaches are one of the most widely utilized for this goal, but involve significant computational cost, thus limiting their application to small sets of compounds. Lambda dynamics, also rigorously based on the principles of statistical mechanics, provides a more efficient alternative. In this paper, we describe the development of a workflow to setup, execute, and analyze Multi-Site Lambda Dynamics (MSLD) calculations run on GPUs with CHARMm implemented in BIOVIA Discovery Studio and Pipeline Pilot. The workflow establishes a framework for setting up simulation systems for exploratory screening of modifications to a lead compound, enabling the calculation of relative binding affinities of combinatorial libraries. To validate the workflow, a diverse dataset of congeneric ligands for seven proteins with experimental binding affinity data is examined. A protocol to automatically tailor fit biasing potentials iteratively to flatten the free energy landscape of any MSLD system is developed that enhances sampling and allows for efficient estimation of free energy differences. The protocol is first validated on a large number of ligand subsets that model diverse substituents, which shows accurate and reliable performance. The scalability of the workflow is also tested to screen more than a hundred ligands modeled in a single system, which also resulted in accurate predictions. With a cumulative sampling time of 150ns or less, the method results in average unsigned errors of under 1 kcal/mol in most cases for both small and large combinatorial libraries. For the multi-site systems examined, the method is estimated to be more than an order of magnitude more efficient than contemporary FEP applications. The results thus demonstrate the utility of the presented MSLD workflow to efficiently screen combinatorial libraries and explore chemical space around a lead compound, and thus are of utility in lead optimization.</p>


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


2020 ◽  
Author(s):  
conor parks ◽  
Zied Gaieb ◽  
Rommie Amaro

<div><div><div><p>Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.</p></div></div></div>


2020 ◽  
Vol 222 (1) ◽  
pp. 247-259 ◽  
Author(s):  
Davood Moghadas

SUMMARY Conventional geophysical inversion techniques suffer from several limitations including computational cost, nonlinearity, non-uniqueness and dimensionality of the inverse problem. Successful inversion of geophysical data has been a major challenge for decades. Here, a novel approach based on deep learning (DL) inversion via convolutional neural network (CNN) is proposed to instantaneously estimate subsurface electrical conductivity (σ) layering from electromagnetic induction (EMI) data. In this respect, a fully convolutional network was trained on a large synthetic data set generated based on 1-D EMI forward model. The accuracy of the proposed approach was examined using several synthetic scenarios. Moreover, the trained network was used to find subsurface electromagnetic conductivity images (EMCIs) from EMI data measured along two transects from Chicken Creek catchment (Brandenburg, Germany). Dipole–dipole electrical resistivity tomography data were measured as well to obtain reference subsurface σ distributions down to a 6 m depth. The inversely estimated models were juxtaposed and compared with their counterparts obtained from a spatially constrained deterministic algorithm as a standard code. Theoretical simulations demonstrated a well performance of the algorithm even in the presence of noise in data. Moreover, application of the DL inversion for subsurface imaging from Chicken Creek catchment manifested the accuracy and robustness of the proposed approach for EMI inversion. This approach returns subsurface σ distribution directly from EMI data in a single step without any iterations. The proposed strategy simplifies considerably EMI inversion and allows for rapid and accurate estimation of subsurface EMCI from multiconfiguration EMI data.


2020 ◽  
Author(s):  
conor parks ◽  
Zied Gaieb ◽  
Rommie Amaro

<div><div><div><p>Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.</p></div></div></div>


2020 ◽  
Author(s):  
E. Prabhu Raman ◽  
Thomas J. Paul ◽  
Ryan L. Hayes ◽  
Charles L. Brooks III

<p>Accurate predictions of changes to protein-ligand binding affinity in response to chemical modifications are of utility in small molecule lead optimization. Relative free energy perturbation (FEP) approaches are one of the most widely utilized for this goal, but involve significant computational cost, thus limiting their application to small sets of compounds. Lambda dynamics, also rigorously based on the principles of statistical mechanics, provides a more efficient alternative. In this paper, we describe the development of a workflow to setup, execute, and analyze Multi-Site Lambda Dynamics (MSLD) calculations run on GPUs with CHARMm implemented in BIOVIA Discovery Studio and Pipeline Pilot. The workflow establishes a framework for setting up simulation systems for exploratory screening of modifications to a lead compound, enabling the calculation of relative binding affinities of combinatorial libraries. To validate the workflow, a diverse dataset of congeneric ligands for seven proteins with experimental binding affinity data is examined. A protocol to automatically tailor fit biasing potentials iteratively to flatten the free energy landscape of any MSLD system is developed that enhances sampling and allows for efficient estimation of free energy differences. The protocol is first validated on a large number of ligand subsets that model diverse substituents, which shows accurate and reliable performance. The scalability of the workflow is also tested to screen more than a hundred ligands modeled in a single system, which also resulted in accurate predictions. With a cumulative sampling time of 150ns or less, the method results in average unsigned errors of under 1 kcal/mol in most cases for both small and large combinatorial libraries. For the multi-site systems examined, the method is estimated to be more than an order of magnitude more efficient than contemporary FEP applications. The results thus demonstrate the utility of the presented MSLD workflow to efficiently screen combinatorial libraries and explore chemical space around a lead compound, and thus are of utility in lead optimization.</p>


2019 ◽  
Vol 9 (5) ◽  
pp. 115 ◽  
Author(s):  
Ömer Türk ◽  
Mehmet Siraç Özerdem

The studies implemented with Electroencephalogram (EEG) signals are progressing very rapidly and brain computer interfaces (BCI) and disease determinations are carried out at certain success rates thanks to new methods developed in this field. The effective use of these signals, especially in disease detection, is very important in terms of both time and cost. Currently, in general, EEG studies are used in addition to conventional methods as well as deep learning networks that have recently achieved great success. The most important reason for this is that in conventional methods, increasing classification accuracy is based on too many human efforts as EEG is being processed, obtaining the features is the most important step. This stage is based on both the time-consuming and the investigation of many feature methods. Therefore, there is a need for methods that do not require human effort in this area and can learn the features themselves. Based on that, two-dimensional (2D) frequency-time scalograms were obtained in this study by applying Continuous Wavelet Transform to EEG records containing five different classes. Convolutional Neural Network structure was used to learn the properties of these scalogram images and the classification performance of the structure was compared with the studies in the literature. In order to compare the performance of the proposed method, the data set of the University of Bonn was used. The data set consists of five EEG records containing healthy and epilepsy disease which are labeled as A, B, C, D, and E. In the study, A-E and B-E data sets were classified as 99.50%, A-D and B-D data sets were classified as 100% in binary classifications, A-D-E data sets were 99.00% in triple classification, A-C-D-E data sets were 90.50%, B-C-D-E data sets were 91.50% in quaternary classification, and A-B-C-D-E data sets were in the fifth class classification with an accuracy of 93.60%.


Sign in / Sign up

Export Citation Format

Share Document