PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions

AbstractResistance to small-molecule drugs is the main cause of the failure of therapeutic drugs in clinical practice. Missense mutations altering the binding of ligands to proteins are one of the critical mechanisms that result in genetic disease and drug resistance. Computational methods have made a lot of progress for predicting binding affinity changes and identifying resistance mutations, but their prediction accuracy and speed are still not satisfied and need to be further improved. To address these issues, we introduce a structure-based machine learning method for quantitatively estimating the effects of single mutations on ligand binding affinity changes (named as PremPLI). A comprehensive comparison of the predictive performance of PremPLI with other available methods on two benchmark datasets confirms that our approach performs robustly and presents similar or even higher predictive accuracy than the approaches relying on first-principle statistical mechanics and mixed physics- and knowledge-based potentials while requires much less computational resources. PremPLI can be used for guiding the design of ligand-binding proteins, identifying and understanding disease driver mutations, and finding potential resistance mutations for different drugs. PremPLI is freely available at https://lilab.jysw.suda.edu.cn/research/PremPLI/ and allows to do large-scale mutational scanning.

Download Full-text

PremPLI: Predicting the Effects of Missense Mutations on Protein-Ligand Interactions

10.21203/rs.3.rs-417047/v1 ◽

2021 ◽

Author(s):

Tingting Sun ◽

Yuting Chen ◽

Yuhao Wen ◽

Zefeng Zhu ◽

Minghui Li

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Large Scale ◽

Predictive Accuracy ◽

Predictive Performance ◽

Driver Mutations ◽

Missense Mutations ◽

Resistance Mutations ◽

Protein Ligand Interactions ◽

Ligand Interactions

Abstract Protein-ligand interactions trigger a multitude of signal transduction processes and resistance to small-molecule drugs is the main cause of the failure of therapeutic drugs in clinical practice. Missense mutations altering the binding of ligands to proteins are one of the critical mechanisms that result in genetic disease and drug resistance. Computational methods have made a lot of progress for predicting binding affinity changes and identifying resistance mutations, but they are still not satisfied and need to be further improved in both accuracy and speed. To address these issues, we introduced PremPLI, a structure-based machine learning method for quantitatively estimating the effects of single mutations on ligand binding affinity changes. A comprehensive comparison of the predictive performance of PremPLI with other available methods on two benchmark datasets confirms that our approach performs robustly and presents similar or even higher predictive accuracy than the approaches relying on first-principle statistical mechanics and mixed physics- and knowledge-based potentials while requires much less computational resources. PremPLI can be used for guiding the design of ligand-binding proteins, identifying and understanding disease driver mutations, and finding potential resistance mutations for different drugs. PremPLI is freely available at https://lilab.jysw.suda.edu.cn/research/PremPLI/ and allows to do large-scale mutational scanning.

Download Full-text

PremPS: Predicting the impact of missense mutations on protein stability

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008543 ◽

2020 ◽

Vol 16 (12) ◽

pp. e1008543

Author(s):

Yuting Chen ◽

Haoyu Lu ◽

Ning Zhang ◽

Zefeng Zhu ◽

Shuqin Wang ◽

...

Keyword(s):

Protein Stability ◽

Protein Design ◽

Large Scale ◽

Molecular Mechanisms ◽

Predictive Performance ◽

Computational Method ◽

Single Mutation ◽

Missense Mutations ◽

Benchmark Datasets ◽

The Impact

Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation.

Download Full-text

Computational Prediction of Binding Affinity for CDK2-ligand Complexes. A Protein Target for Cancer Drug Discovery

Current Medicinal Chemistry ◽

10.2174/0929867328666210806105810 ◽

2021 ◽

Vol 28 ◽

Author(s):

Martina Veit-Acosta ◽

Walter Filgueira de Azevedo Junior

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Physical Modeling ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Protein Target ◽

Learning Techniques

Background: CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. Objective: This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. Method: We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. Results: Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. Conclusion: All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.

Download Full-text

Machine Learning-Based Scoring Functions. Development and Applications with SAnDReS.

Current Medicinal Chemistry ◽

10.2174/0929867327666200515101820 ◽

2020 ◽

Vol 27 ◽

Author(s):

Gabriela Bitencourt-Ferreira ◽

Camila Rizzotto ◽

Walter Filgueira de Azevedo Junior

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Drug Targets ◽

Computational Models ◽

Factor Xa ◽

Coagulation Factor ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Molegro Virtual Docker

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors

Journal of Cheminformatics ◽

10.1186/s13321-021-00507-1 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Surendra Kumar ◽

Mi-hyun Kim

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Scoring Functions ◽

Binding Affinities ◽

Ligand Interaction ◽

Fingerprint Pattern ◽

Comparable Performance ◽

Direct Interpretation ◽

Benchmark Datasets ◽

Complex Features

AbstractIn drug discovery, rapid and accurate prediction of protein–ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope for a high correlation between docking score and pose with key interactive residues, although scoring functions as free energy surrogates of protein–ligand complexes have failed to provide collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawbacks of scoring functions. Despite being highly accurate, their featurization process is complex and the meaning of the embedded features cannot directly be interpreted by human recognition without an additional feature analysis. Here, we propose SMPLIP-Score (Substructural Molecular and Protein–Ligand Interaction Pattern Score), a direct interpretable predictor of absolute binding affinity. Our simple featurization embeds the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite their less complex features than other state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80, and a root mean square error up to 1.18 in pK units with several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness were examined using direct interpretation of feature matrices for specific targets.

Download Full-text

Ollivier Persistent Ricci Curvature-Based Machine Learning for the Protein–Ligand Binding Affinity Prediction

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.0c01415 ◽

2021 ◽

Author(s):

JunJie Wee ◽

Kelin Xia

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Ricci Curvature ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

A proof-of-concept study applying machine learning methods to putative risk factors for eating disorders: results from the multi-centre European project on healthy eating

Psychological Medicine ◽

10.1017/s003329172100489x ◽

2021 ◽

pp. 1-10

Author(s):

I. Krug ◽

J. Linardon ◽

C. Greenwood ◽

G. Youssef ◽

J. Treasure ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Prediction Rule ◽

Predictive Performance ◽

Individual Risk ◽

European Project ◽

Wide Range

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.

Download Full-text

Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction

International Journal for Numerical Methods in Biomedical Engineering ◽

10.1002/cnm.2914 ◽

2017 ◽

Vol 34 (2) ◽

pp. e2914 ◽

Cited By ~ 43

Author(s):

Zixuan Cang ◽

Guo-Wei Wei

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Persistent Homology ◽

Binding Affinity Prediction ◽

Affinity Prediction

Download Full-text

Investigating structure function relationships in the NOTCH family through large-scale somatic DNA sequencing studies

10.1101/2020.03.31.018325 ◽

2020 ◽

Author(s):

Michael W J Hall ◽

David Shorthouse ◽

Philip H Jones ◽

Benjamin A Hall

Keyword(s):

Dna Sequencing ◽

Structure Function ◽

Calcium Binding ◽

Large Scale ◽

Driver Mutations ◽

Missense Mutations ◽

Mutant Selection ◽

Ligand Interaction ◽

Binding Interface

AbstractThe recent development of highly sensitive DNA sequencing techniques has detected large numbers of missense mutations of genes, including NOTCH1 and 2, in ageing normal tissues. Driver mutations persist and propagate in the tissue through a selective advantage over both wild-type cells and alternative mutations. This process of selection can be considered as a large scale, in vivo screen for mutations that increase clone fitness. It follows that the specific missense mutations that are observed in individual genes may offer us insights into the structure-function relationships. Here we show that the positively selected missense mutations in NOTCH1 and NOTCH2 in human oesophageal epithelium cause inactivation predominantly through protein misfolding. Once these mutations are excluded, we further find statistically significant evidence for selection at the ligand binding interface and calcium binding sites. In this, we observe stronger evidence of selection at the ligand interface on EGF12 over EGF11, suggesting that in this tissue EGF12 may play a more important role in ligand interaction. Finally, we show how a mutation hotspot in the NOTCH1 transmembrane helix arises through the intersection of both a high mutation rate and residue conservation. Together these insights offer a route to understanding the mechanism of protein function through in vivo mutant selection.

Download Full-text