Machine Learning in Quantitative Protein–peptide Affinity Prediction: Implications for Therapeutic Peptide Design

Background:Protein–peptide recognition plays an essential role in the orchestration and regulation of cell signaling networks, which is estimated to be responsible for up to 40% of biological interaction events in the human interactome and has recently been recognized as a new and attractive druggable target for drug development and disease intervention.Methods:We present a systematic review on the application of machine learning techniques in the quantitative modeling and prediction of protein–peptide binding affinity, particularly focusing on its implications for therapeutic peptide design. We also briefly introduce the physical quantities used to characterize protein–peptide affinity and attempt to extend the content of generalized machine learning methods.Results:Existing issues and future perspective on the statistical modeling and regression prediction of protein– peptide binding affinity are discussed.Conclusion:There is still a long way to go before establishment of general, reliable and efficient machine leaningbased protein–peptide affinity predictors.

Download Full-text

Machine Learning-Based Scoring Functions. Development and Applications with SAnDReS.

Current Medicinal Chemistry ◽

10.2174/0929867327666200515101820 ◽

2020 ◽

Vol 27 ◽

Author(s):

Gabriela Bitencourt-Ferreira ◽

Camila Rizzotto ◽

Walter Filgueira de Azevedo Junior

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Drug Targets ◽

Computational Models ◽

Factor Xa ◽

Coagulation Factor ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Molegro Virtual Docker

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Sequence Permutation of Positive Charges in a Model Peptide Antibiotic Produces Differing Enthalpic and Entropic Contributions to the Lipid-Peptide Binding Affinity

Biophysical Journal ◽

10.1016/j.bpj.2015.11.2281 ◽

2016 ◽

Vol 110 (3) ◽

pp. 422a

Author(s):

Brianna Haight ◽

Ellen R. Arndt ◽

Adrienne P. Loh

Keyword(s):

Binding Affinity ◽

Peptide Binding ◽

Peptide Antibiotic ◽

Model Peptide ◽

Peptide Binding Affinity

Download Full-text

Predictive Analytics for Biomineralization Peptide Binding Affinity

BioNanoScience ◽

10.1007/s12668-018-0578-4 ◽

2018 ◽

Vol 9 (1) ◽

pp. 74-78 ◽

Cited By ~ 2

Author(s):

Jose Isagani B. Janairo

Keyword(s):

Binding Affinity ◽

Predictive Analytics ◽

Peptide Binding ◽

Peptide Binding Affinity

Download Full-text

Effects of Mutations on Replicative Fitness and Major Histocompatibility Complex Class I Binding Affinity Are Among the Determinants Underlying Cytotoxic-T-Lymphocyte Escape of HIV-1 Gag Epitopes

mBio ◽

10.1128/mbio.01050-17 ◽

2017 ◽

Vol 8 (6) ◽

Cited By ~ 7

Author(s):

Yushen Du ◽

Tian-Hao Zhang ◽

Lei Dai ◽

Xiaojuan Zheng ◽

Aleksandr M. Gorin ◽

...

Keyword(s):

Binding Affinity ◽

High Throughput ◽

Peptide Binding ◽

Virus Evolution ◽

Mhc I ◽

Hla Alleles ◽

Ctl Escape ◽

Peptide Binding Affinity ◽

Hiv 1

ABSTRACT Certain “protective” major histocompatibility complex class I (MHC-I) alleles, such as B*57 and B*27, are associated with long-term control of HIV-1 in vivo mediated by the CD8+ cytotoxic-T-lymphocyte (CTL) response. However, the mechanism of such superior protection is not fully understood. Here we combined high-throughput fitness profiling of mutations in HIV-1 Gag, in silico prediction of MHC-peptide binding affinity, and analysis of intraperson virus evolution to systematically compare differences with respect to CTL escape mutations between epitopes targeted by protective MHC-I alleles and those targeted by nonprotective MHC-I alleles. We observed that the effects of mutations on both viral replication and MHC-I binding affinity are among the determinants of CTL escape. Mutations in Gag epitopes presented by protective MHC-I alleles are associated with significantly higher fitness cost and lower reductions in binding affinity with respect to MHC-I. A linear regression model accounting for the effect of mutations on both viral replicative capacity and MHC-I binding can explain the protective efficacy of MHC-I alleles. Finally, we found a consistent pattern in the evolution of Gag epitopes in long-term nonprogressors versus progressors. Overall, our results suggest that certain protective MHC-I alleles allow superior control of HIV-1 by targeting epitopes where mutations typically incur high fitness costs and small reductions in MHC-I binding affinity. IMPORTANCE Understanding the mechanism of viral control achieved in long-term nonprogressors with protective HLA alleles provides insights for developing functional cure of HIV infection. Through the characterization of CTL escape mutations in infected persons, previous researchers hypothesized that protective alleles target epitopes where escape mutations significantly reduce viral replicative capacity. However, these studies were usually limited to a few mutations observed in vivo. Here we utilized our recently developed high-throughput fitness profiling method to quantitatively measure the fitness of mutations across the entirety of HIV-1 Gag. The data enabled us to integrate the results with in silico prediction of MHC-peptide binding affinity and analysis of intraperson virus evolution to systematically determine the differences in CTL escape mutations between epitopes targeted by protective HLA alleles and those targeted by nonprotective HLA alleles. We observed that the effects of Gag epitope mutations on HIV replicative fitness and MHC-I binding affinity are among the major determinants of CTL escape. IMPORTANCE Understanding the mechanism of viral control achieved in long-term nonprogressors with protective HLA alleles provides insights for developing functional cure of HIV infection. Through the characterization of CTL escape mutations in infected persons, previous researchers hypothesized that protective alleles target epitopes where escape mutations significantly reduce viral replicative capacity. However, these studies were usually limited to a few mutations observed in vivo. Here we utilized our recently developed high-throughput fitness profiling method to quantitatively measure the fitness of mutations across the entirety of HIV-1 Gag. The data enabled us to integrate the results with in silico prediction of MHC-peptide binding affinity and analysis of intraperson virus evolution to systematically determine the differences in CTL escape mutations between epitopes targeted by protective HLA alleles and those targeted by nonprotective HLA alleles. We observed that the effects of Gag epitope mutations on HIV replicative fitness and MHC-I binding affinity are among the major determinants of CTL escape.

Download Full-text

DeepMHCII: A Novel Binding Core-Aware Deep Interaction Model for Accurate MHC II-peptide Binding Affinity Prediction

10.1101/2021.12.27.474242 ◽

2021 ◽

Author(s):

Ronghui You ◽

Wei Qu ◽

Hiroshi Mamitsuka ◽

Shanfeng Zhu

Keyword(s):

Deep Learning ◽

Binding Affinity ◽

Mhc Class Ii ◽

Large Scale ◽

Peptide Binding ◽

Class Ii ◽

Biological Knowledge ◽

Binding Interaction ◽

Binding Core ◽

Peptide Binding Affinity

Computationally predicting MHC-peptide binding affinity is an important problem in immunological bioinformatics. Recent cutting-edge deep learning-based methods for this problem are unable to achieve satisfactory performance for MHC class II molecules. This is because such methods generate the input by simply concatenating the two given sequences: (the estimated binding core of) a peptide and (the pseudo sequence of) an MHC class II molecule, ignoring the biological knowledge behind the interactions of the two molecules. We thus propose a binding core-aware deep learning-based model, DeepMHCII, with binding interaction convolution layer (BICL), which allows integrating all potential binding cores (in a given peptide) and the MHC pseudo (binding) sequence, through modeling the interaction with multiple convolutional kernels. Extensive empirical experiments with four large-scale datasets demonstrate that DeepMHCII significantly outperformed four state-of-the-art methods under numerous settings, such as five-fold cross-validation, leave one molecule out, validation with independent testing sets, and binding core prediction. All these results with visualization of the predicted binding cores indicate the effectiveness and importance of properly modeling biological facts in deep learning for high performance and knowledge discovery. DeepMHCII is publicly available at https://weilab.sjtu.edu.cn/DeepMHCII/.

Download Full-text

Improved methods for predicting peptide binding affinity to MHC class II molecules

Immunology ◽

10.1111/imm.12889 ◽

2018 ◽

Vol 154 (3) ◽

pp. 394-406 ◽

Cited By ~ 202

Author(s):

Kamilla Kjaergaard Jensen ◽

Massimo Andreatta ◽

Paolo Marcatili ◽

Søren Buus ◽

Jason A. Greenbaum ◽

...

Keyword(s):

Binding Affinity ◽

Mhc Class Ii ◽

Peptide Binding ◽

Class Ii ◽

Improved Methods ◽

Peptide Binding Affinity ◽

Mhc Class Ii Molecules

Download Full-text

Stability and peptide binding affinity of an SH3 domain from theCaenorhabditis eleganssignaling protein Sem-5

Protein Science ◽

10.1002/pro.5560030812 ◽

1994 ◽

Vol 3 (8) ◽

pp. 1261-1266 ◽

Cited By ~ 39

Author(s):

Wendell A. Lim ◽

Robert O. Fox ◽

Frederic M. Richards

Keyword(s):

Binding Affinity ◽

Peptide Binding ◽

Sh3 Domain ◽

Peptide Binding Affinity

Download Full-text

Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2

Biochemical and Biophysical Research Communications ◽

10.1016/j.bbrc.2017.10.035 ◽

2017 ◽

Vol 494 (1-2) ◽

pp. 305-310 ◽

Cited By ~ 33

Author(s):

Maurício Boff de Ávila ◽

Mariana Morrone Xavier ◽

Val Oliveira Pintro ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cyclin Dependent Kinase ◽

Learning Techniques

Download Full-text

HLA class I binding prediction via convolutional neural networks

10.1101/099358 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yeeleng S. Vang ◽

Xiaohui Xie

Keyword(s):

Machine Learning ◽

Network Architecture ◽

Peptide Binding ◽

Hla Class I ◽

Class I ◽

Machine Learning Techniques ◽

Binding Potential ◽

Distributed Representation ◽

Binding Prediction ◽

Peptide Binding Prediction

AbstractMany biological processes are governed by protein-ligand interactions. One such example is the recognition of self and non-self cells by the immune system. This immune response process is regulated by the major histocompatibility complex (MHC) protein which is encoded by the human leukocyte antigen (HLA) complex. Understanding the binding potential between MHC and peptides can lead to the design of more potent, peptide-based vaccines and immunotherapies for infectious autoimmune diseases.We apply machine learning techniques from the natural language processing (NLP) domain to address the task of MHC-peptide binding prediction. More specifically, we introduce a new distributed representation of amino acids, name HLA-Vec, that can be used for a variety of downstream proteomic machine learning tasks. We then propose a deep convolutional neural network architecture, name HLA-CNN, for the task of HLA class I-peptide binding prediction. Experimental results show combining the new distributed representation with our HLA-CNN architecture acheives state-of-the-art results in the majority of the latest two Immune Epitope Database (IEDB) weekly automated benchmark datasets. We further apply our model to predict binding on the human genome and identify 15 genes with potential for self binding. Codes are available at https://github.com/uci-cbcl/HLA-bind.

Download Full-text