Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening

Qurrat Ul Ain; Antoniya Aleksandrova; Florian D. Roessler; Pedro J. Ballester

doi:10.1002/wcms.1225

The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction

Biomolecules ◽

10.3390/biom8010012 ◽

2018 ◽

Vol 8 (1) ◽

pp. 12 ◽

Cited By ~ 24

Author(s):

Hongjian Li ◽

Jiangjun Peng ◽

Yee Leung ◽

Kwong-Sak Leung ◽

Man-Hon Wong ◽

...

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Binding Affinity ◽

Sequence Similarity ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

The Impact

Download Full-text

DeepFrag: A Deep Convolutional Neural Network for Fragment-based Lead Optimization

Chemical Science ◽

10.1039/d1sc00163a ◽

2021 ◽

Author(s):

Harrison Green ◽

David Ryan Koes ◽

Jacob D Durrant

Keyword(s):

Neural Network ◽

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Convolutional Neural Network ◽

Binding Affinity ◽

Lead Optimization ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Computer Aided

Machine learning has been increasingly applied to the field of computer-aided drug discovery in recent years, leading to notable advances in binding-affinity prediction, virtual screening, and QSAR. Surprisingly, it is...

Download Full-text

A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2014.2351824 ◽

2015 ◽

Vol 12 (2) ◽

pp. 335-347 ◽

Cited By ~ 22

Author(s):

Hossam M. Ashtawy ◽

Nihar R. Mahapatra

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Comparative Assessment ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Learning from the ligand: using ligand-based features to improve binding affinity prediction

Bioinformatics ◽

10.1093/bioinformatics/btz665 ◽

2019 ◽

Cited By ~ 7

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett M Morris

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Pearson Correlation ◽

Scoring Function ◽

Supplementary Information ◽

Scoring Functions ◽

Limited Information ◽

Ligand Complex ◽

Binding Affinity Prediction ◽

Affinity Prediction

Abstract Motivation Machine learning scoring functions for protein–ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein–ligand complex, with limited information about the chemical or topological properties of the ligand itself. Results We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest (RF) combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.836, 0.780 and 0.821 on the PDBbind 2007, 2013 and 2016 core sets, respectively, compared to 0.790, 0.746 and 0.814 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a RF using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets. Availability and implementation Data and code to reproduce all the results are freely available at http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Extended connectivity interaction features: improving binding affinity prediction through chemical description

Bioinformatics ◽

10.1093/bioinformatics/btaa982 ◽

2020 ◽

Cited By ~ 1

Author(s):

Norberto Sánchez-Cruz ◽

José L Medina-Franco ◽

Jordi Mestres ◽

Xavier Barril

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Supplementary Information ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Extended Connectivity ◽

Chemical Description

Abstract Motivation Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein–ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited. Results Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein–ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein−ligand atom-type pair counts that take into account each atom’s connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein–ligand affinities (pKd/pKi). The models were evaluated in terms of ‘scoring power’ on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power. Availability and implementation Data and code to reproduce all the results are freely available at https://github.com/DIFACQUIM/ECIF. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2012.36 ◽

2012 ◽

Vol 9 (5) ◽

pp. 1301-1313 ◽

Cited By ~ 27

Author(s):

Hossam M. Ashtawy ◽

Nihar R. Mahapatra

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Comparative Assessment ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Learning from the Ligand: Using Ligand-Based Features to Improve Binding Affinity Prediction

10.26434/chemrxiv.8174525.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett Morris

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binding Affinity ◽

Pearson Correlation ◽

Scoring Function ◽

Scoring Functions ◽

Limited Information ◽

Ligand Complex ◽

Binding Affinity Prediction ◽

Affinity Prediction

Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information about the chemical or topological properties of the ligand itself. We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.831, 0.785, and 0.821 on the PDBbind 2007, 2013, and 2016 core sets respectively, compared to 0.790, 0.737, and 0.797 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a Random Forest using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets.<br>

Download Full-text

Learning from the Ligand: Using Ligand-Based Features to Improve Binding Affinity Prediction

10.26434/chemrxiv.8174525 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett Morris

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binding Affinity ◽

Pearson Correlation ◽

Scoring Function ◽

Scoring Functions ◽

Limited Information ◽

Ligand Complex ◽

Binding Affinity Prediction ◽

Affinity Prediction

Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information about the chemical or topological properties of the ligand itself. We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.831, 0.785, and 0.821 on the PDBbind 2007, 2013, and 2016 core sets respectively, compared to 0.790, 0.737, and 0.797 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a Random Forest using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets.<br>

Download Full-text

Ollivier Persistent Ricci Curvature-Based Machine Learning for the Protein–Ligand Binding Affinity Prediction

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.0c01415 ◽

2021 ◽

Author(s):

JunJie Wee ◽

Kelin Xia

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Ricci Curvature ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction

International Journal for Numerical Methods in Biomedical Engineering ◽

10.1002/cnm.2914 ◽

2017 ◽

Vol 34 (2) ◽

pp. e2914 ◽

Cited By ~ 43

Author(s):

Zixuan Cang ◽

Guo-Wei Wei

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Persistent Homology ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text