An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

10.26434/chemrxiv.11750544 ◽

2020 ◽

Author(s):

conor parks ◽

Zied Gaieb ◽

Rommie Amaro

Keyword(s):

Neural Network ◽

Random Forest ◽

Ligand Binding ◽

Binding Affinity ◽

Prediction Intervals ◽

Grand Challenge ◽

Feed Forward Neural Network ◽

Data Set ◽

Conformal Prediction ◽

External Test

<div><div><div><p>Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.</p></div></div></div>

Download Full-text

OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity Based on Residue-Atom Contacting Shells

Frontiers in Chemistry ◽

10.3389/fchem.2021.753002 ◽

2021 ◽

Vol 9 ◽

Author(s):

Zechen Wang ◽

Liangzhen Zheng ◽

Yang Liu ◽

Yuanyuan Qu ◽

Yong-Qiang Li ◽

...

Keyword(s):

Neural Network ◽

Ligand Binding ◽

Convolutional Neural Network ◽

Binding Affinity ◽

Binding Free Energy ◽

Computational Cost ◽

Scoring Function ◽

Quality Data ◽

Great Success ◽

Data Set

One key task in virtual screening is to accurately predict the binding affinity (△G) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict △G. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.

Download Full-text

A Cascade Graph Convolutional Network for Predicting Protein–Ligand Binding Affinity

International Journal of Molecular Sciences ◽

10.3390/ijms22084023 ◽

2021 ◽

Vol 22 (8) ◽

pp. 4023

Author(s):

Huimin Shen ◽

Youzhi Zhang ◽

Chunhou Zheng ◽

Bing Wang ◽

Peng Chen

Keyword(s):

Neural Network ◽

Ligand Binding ◽

Convolutional Neural Network ◽

Binding Affinity ◽

Network Architecture ◽

Original Data ◽

Ligand Complex ◽

Convolutional Network ◽

Data Set ◽

Irregular Data

Accurate prediction of binding affinity between protein and ligand is a very important step in the field of drug discovery. Although there are many methods based on different assumptions and rules do exist, prediction performance of protein–ligand binding affinity is not satisfactory so far. This paper proposes a new cascade graph-based convolutional neural network architecture by dealing with non-Euclidean irregular data. We represent the molecule as a graph, and use a simple linear transformation to deal with the sparsity problem of the one-hot encoding of original data. The first stage adopts ARMA graph convolutional neural network to learn the characteristics of atomic space in the protein–ligand complex. In the second stage, one variant of the MPNN graph convolutional neural network is introduced with chemical bond information and interactive atomic features. Finally, the architecture passes through the global add pool and the fully connected layer, and outputs a constant value as the predicted binding affinity. Experiments on the PDBbind v2016 data set showed that our method is better than most of the current methods. Our method is also comparable to the state-of-the-art method on the data set, and is more intuitive and simple.

Download Full-text

Random Forest (RF) and Artificial Neural Network (ANN) Algorithms for LULC Mapping

Engineering and Technology Journal ◽

10.30684/etj.v38i4a.399 ◽

2020 ◽

Vol 38 (4A) ◽

pp. 510-514

Author(s):

Tay H. Shihab ◽

Amjed N. Al-Hameedawi ◽

Ammar M. Hamza

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Artificial Neural Network ◽

Random Forest ◽

Satellite Image ◽

Landsat 8 ◽

Optical Remote Sensing ◽

Data Set ◽

Artificial Neural ◽

Artificial Neural Network Ann

In this paper to make use of complementary potential in the mapping of LULC spatial data is acquired from LandSat 8 OLI sensor images are taken in 2019. They have been rectified, enhanced and then classified according to Random forest (RF) and artificial neural network (ANN) methods. Optical remote sensing images have been used to get information on the status of LULC classification, and extraction details. The classification of both satellite image types is used to extract features and to analyse LULC of the study area. The results of the classification showed that the artificial neural network method outperforms the random forest method. The required image processing has been made for Optical Remote Sensing Data to be used in LULC mapping, include the geometric correction, Image Enhancements, The overall accuracy when using the ANN methods 0.91 and the kappa accuracy was found 0.89 for the training data set. While the overall accuracy and the kappa accuracy of the test dataset were found 0.89 and 0.87 respectively.

Download Full-text

Blinded prediction of protein–ligand binding affinity using Amber thermodynamic integration for the 2018 D3R grand challenge 4

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-019-00223-x ◽

2019 ◽

Vol 33 (12) ◽

pp. 1021-1029 ◽

Cited By ~ 5

Author(s):

Junjie Zou ◽

Chuan Tian ◽

Carlos Simmerling

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Thermodynamic Integration ◽

Grand Challenge

Download Full-text

DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity

Bioinformatics and Biology Insights ◽

10.1177/11779322211030364 ◽

2021 ◽

Vol 15 ◽

pp. 117793222110303

Author(s):

Asad Ahmed ◽

Bhavika Mam ◽

Ramanathan Sowdhamini

Keyword(s):

Deep Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Chemical Space ◽

Biological Significance ◽

Protein Crystal ◽

Ligand Docking ◽

Complex Data ◽

Binding Prediction ◽

Data Set

Protein-ligand binding prediction has extensive biological significance. Binding affinity helps in understanding the degree of protein-ligand interactions and is a useful measure in drug design. Protein-ligand docking using virtual screening and molecular dynamic simulations are required to predict the binding affinity of a ligand to its cognate receptor. Performing such analyses to cover the entire chemical space of small molecules requires intense computational power. Recent developments using deep learning have enabled us to make sense of massive amounts of complex data sets where the ability of the model to “learn” intrinsic patterns in a complex plane of data is the strength of the approach. Here, we have incorporated convolutional neural networks to find spatial relationships among data to help us predict affinity of binding of proteins in whole superfamilies toward a diverse set of ligands without the need of a docked pose or complex as user input. The models were trained and validated using a stringent methodology for feature extraction. Our model performs better in comparison to some existing methods used widely and is suitable for predictions on high-resolution protein crystal (⩽2.5 Å) and nonpeptide ligand as individual inputs. Our approach to network construction and training on protein-ligand data set prepared in-house has yielded significant insights. We have also tested DEELIG on few COVID-19 main protease-inhibitor complexes relevant to the current public health scenario. DEELIG-based predictions can be incorporated in existing databases including RSCB PDB, PDBMoad, and PDBbind in filling missing binding affinity data for protein-ligand complexes.

Download Full-text

A Proposed Heuristic Optimization Algorithm for Detecting Network Attacks

The Academic Research Community Publication ◽

10.21625/archive.v2i4.397 ◽

2019 ◽

Vol 2 (4) ◽

pp. 530

Author(s):

Amr Hassan Yassin ◽

Hany Hamdy Hussien

Keyword(s):

Neural Network ◽

Heuristic Algorithms ◽

Feed Forward Neural Network ◽

Data Set ◽

Network Attacks ◽

Security Issues ◽

Proposed Model ◽

Network Intrusions ◽

Attack Patterns ◽

Selection Of

Due to the exponential growth of E-Business and computing capabilities over the web for a pay-for-use groundwork, the risk factors regarding security issues also increase rapidly. As the usage increases, it becomes very difficult to identify malicious attacks since the attack patterns change. Therefore, host machines in the network must continually be monitored for intrusions since they are the final endpoint of any network. The purpose of this work is to introduce a generalized neural network model that has the ability to detect network intrusions. Two recent heuristic algorithms inspired by the behavior of natural phenomena, namely, the particle swarm optimization (PSO) and gravitational search (GSA) algorithms are introduced. These algorithms are combined together to train a feed forward neural network (FNN) for the purpose of utilizing the effectiveness of these algorithms to reduce the problems of getting stuck in local minima and the time-consuming convergence rate. Dimension reduction focuses on using information obtained from NSL-KDD Cup 99 data set for the selection of some features to discover the type of attacks. Detecting the network attacks and the performance of the proposed model are evaluated under different patterns of network data.

Download Full-text

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained On Docked Poses

10.26434/chemrxiv.13637756 ◽

2021 ◽

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett Morris

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Crystal Structures ◽

Binding Affinity ◽

Scoring Function ◽

Scoring Functions ◽

Data Set ◽

Core Sets ◽

Strong Performance

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.

Download Full-text