Ollivier Persistent Ricci Curvature-Based Machine Learning for the Protein–Ligand Binding Affinity Prediction

Molecular descriptors are essential to not only quantitative structure-activity relationship (QSAR) models but also machine learning–based material, chemical, and biological data analysis. Here, we propose persistent spectral–based machine learning (PerSpect ML) models for drug design. Different from all previous spectral models, a filtration process is introduced to generate a sequence of spectral models at various different scales. PerSpect attributes are defined as the function of spectral variables over the filtration value. Molecular descriptors obtained from PerSpect attributes are combined with machine learning models for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases including PDBbind-2007, PDBbind-2013, and PDBbind-2016, are better than all existing models, as far as we know. The proposed PerSpect theory provides a powerful feature engineering framework. PerSpect ML models demonstrate great potential to significantly improve the performance of learning models in molecular data analysis.

Download Full-text

A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2014.2351824 ◽

2015 ◽

Vol 12 (2) ◽

pp. 335-347 ◽

Cited By ~ 22

Author(s):

Hossam M. Ashtawy ◽

Nihar R. Mahapatra

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Comparative Assessment ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbab127 ◽

2021 ◽

Author(s):

Xiang Liu ◽

Huitao Feng ◽

Jie Wu ◽

Kelin Xia

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Molecular Descriptors ◽

Biological Data ◽

Learning Models ◽

Filtration Process ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Machine Learning Models

Abstract Molecular descriptors are essential to not only quantitative structure activity/property relationship (QSAR/QSPR) models, but also machine learning based chemical and biological data analysis. In this paper, we propose persistent spectral hypergraph (PSH) based molecular descriptors or fingerprints for the first time. Our PSH-based molecular descriptors are used in the characterization of molecular structures and interactions, and further combined with machine learning models, in particular gradient boosting tree (GBT), for protein-ligand binding affinity prediction. Different from traditional molecular descriptors, which are usually based on molecular graph models, a hypergraph-based topological representation is proposed for protein–ligand interaction characterization. Moreover, a filtration process is introduced to generate a series of nested hypergraphs in different scales. For each of these hypergraphs, its eigen spectrum information can be obtained from the corresponding (Hodge) Laplacain matrix. PSH studies the persistence and variation of the eigen spectrum of the nested hypergraphs during the filtration process. Molecular descriptors or fingerprints can be generated from persistent attributes, which are statistical or combinatorial functions of PSH, and combined with machine learning models, in particular, GBT. We test our PSH-GBT model on three most commonly used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing machine learning models with traditional molecular descriptors, as far as we know.

Download Full-text

A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2012.36 ◽

2012 ◽

Vol 9 (5) ◽

pp. 1301-1313 ◽

Cited By ~ 27

Author(s):

Hossam M. Ashtawy ◽

Nihar R. Mahapatra

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Comparative Assessment ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening

Wiley Interdisciplinary Reviews Computational Molecular Science ◽

10.1002/wcms.1225 ◽

2015 ◽

Vol 5 (6) ◽

pp. 405-424 ◽

Cited By ~ 101

Author(s):

Qurrat Ul Ain ◽

Antoniya Aleksandrova ◽

Florian D. Roessler ◽

Pedro J. Ballester

Keyword(s):

Machine Learning ◽

Virtual Screening ◽

Binding Affinity ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design

10.26434/chemrxiv.11833323.v2 ◽

2020 ◽

Author(s):

Paul Francoeur ◽

Tomohide Masuda ◽

David R. Koes

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Mean Squared Error ◽

Comprehensive Evaluation ◽

Training Data ◽

Learning Approaches ◽

Neural Network Models ◽

Structure Based Drug Design ◽

Affinity Prediction

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.

Download Full-text

Development and evaluation of a deep learning model for protein–ligand binding affinity prediction

Bioinformatics ◽

10.1093/bioinformatics/bty374 ◽

2018 ◽

Vol 34 (21) ◽

pp. 3666-3674 ◽

Cited By ~ 62

Author(s):

Marta M Stepniewska-Dziubinska ◽

Piotr Zielenkiewicz ◽

Pawel Siedlecki

Keyword(s):

Deep Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Learning Model ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Deep Learning Model

Download Full-text

ISLAND: in-silico proteins binding affinity prediction using sequence information

BioData Mining ◽

10.1186/s13040-020-00231-w ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Wajid Arshad Abbasi ◽

Adiba Yaseen ◽

Fahad Ul Hassan ◽

Saiqa Andleeb ◽

Fayyaz Ul Amir Afsar Minhas

Keyword(s):

Machine Learning ◽

Protein Binding ◽

Binding Affinity ◽

State Of The Art ◽

Protein Complexes ◽

Protein Structures ◽

Sequence Information ◽

Binding Affinity Prediction ◽

Generalization Performance ◽

Affinity Prediction

Abstract Background Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. Method We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. Results We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software. Conclusion This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.

Download Full-text