scholarly journals SAAMBE-SEQ: a sequence-based method for predicting mutation effect on protein–protein binding affinity

Author(s):  
Gen Li ◽  
Swagata Pahari ◽  
Adithya Krishna Murthy ◽  
Siqi Liang ◽  
Robert Fragoza ◽  
...  

Abstract Motivation Vast majority of human genetic disorders are associated with mutations that affect protein–protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein–protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available. Results Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37–0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein–protein interactions. Availability and implementation SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Sherlyn Jemimah ◽  
Masakazu Sekijima ◽  
M Michael Gromiha

Abstract Motivation Protein–protein interactions are essential for the cell and mediate various functions. However, mutations can disrupt these interactions and may cause diseases. Currently available computational methods require a complex structure as input for predicting the change in binding affinity. Further, they have not included the functional class information for the protein–protein complex. To address this, we have developed a method, ProAffiMuSeq, which predicts the change in binding free energy using sequence-based features and functional class. Results Our method shows an average correlation between predicted and experimentally determined ΔΔG of 0.73 and mean absolute error (MAE) of 0.86 kcal/mol in 10-fold cross-validation and correlation of 0.75 with MAE of 0.94 kcal/mol in the test dataset. ProAffiMuSeq was also tested on an external validation set and showed results comparable to structure-based methods. Our method can be used for large-scale analysis of disease-causing mutations in protein–protein complexes without structural information. Availability and implementation Users can access the method at https://web.iitm.ac.in/bioinfo2/proaffimuseq/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Michael Heyne ◽  
Niv Papo ◽  
Julia Shifman

AbstractQuantifying the effects of various mutations on binding free energy is crucial for understanding the evolution of protein-protein interactions and would greatly facilitate protein engineering studies. Yet, measuring changes in binding free energy (ΔΔGbind) remains a tedious task that requires expression of each mutant, its purification, and affinity measurements. We developed a new approach that allows us to quantify ΔΔGbindfor thousands of protein mutants in one experiment. Our protocol combines protein randomization, Yeast Surface Display technology, Next Generation Sequencing, and a few experimental ΔΔGbinddata points on purified proteins to generate ΔΔGbindvalues for the remaining numerous mutants of the same protein complex. Using this methodology, we comprehensively map the single-mutant binding landscape of one of the highest-affinity interaction between BPTI and Bovine Trypsin. We show that ΔΔGbindfor this interaction could be quantified with high accuracy over the range of 12 kcal/mol displayed by various BPTI single mutants.


Molecules ◽  
2021 ◽  
Vol 26 (18) ◽  
pp. 5696
Author(s):  
Wei Lim Chong ◽  
Koollawat Chupradit ◽  
Sek Peng Chin ◽  
Mai Mai Khoo ◽  
Sook Mei Khor ◽  
...  

Protein-protein interaction plays an essential role in almost all cellular processes and biological functions. Coupling molecular dynamics (MD) simulations and nanoparticle tracking analysis (NTA) assay offered a simple, rapid, and direct approach in monitoring the protein-protein binding process and predicting the binding affinity. Our case study of designed ankyrin repeats proteins (DARPins)—AnkGAG1D4 and the single point mutated AnkGAG1D4-Y56A for HIV-1 capsid protein (CA) were investigated. As reported, AnkGAG1D4 bound with CA for inhibitory activity; however, it lost its inhibitory strength when tyrosine at residue 56 AnkGAG1D4, the most key residue was replaced by alanine (AnkGAG1D4-Y56A). Through NTA, the binding of DARPins and CA was measured by monitoring the increment of the hydrodynamic radius of the AnkGAG1D4-gold conjugated nanoparticles (AnkGAG1D4-GNP) and AnkGAG1D4-Y56A-GNP upon interaction with CA in buffer solution. The size of the AnkGAG1D4-GNP increased when it interacted with CA but not AnkGAG1D4-Y56A-GNP. In addition, a much higher binding free energy (∆GB) of AnkGAG1D4-Y56A (−31 kcal/mol) obtained from MD further suggested affinity for CA completely reduced compared to AnkGAG1D4 (−60 kcal/mol). The possible mechanism of the protein-protein binding was explored in detail by decomposing the binding free energy for crucial residues identification and hydrogen bond analysis.


2020 ◽  
Vol 101 (9) ◽  
pp. 921-924 ◽  
Author(s):  
Jingfang Wang ◽  
Xintian Xu ◽  
Xinbo Zhou ◽  
Ping Chen ◽  
Huiying Liang ◽  
...  

We constructed complex models of SARS-CoV-2 spike protein binding to pangolin or human ACE2, the receptor for virus transmission, and estimated the binding free energy changes using molecular dynamics simulation. SARS-CoV-2 can bind to both pangolin and human ACE2, but has a significantly lower binding affinity for pangolin ACE2 due to the increased binding free energy (9.5 kcal mol−1). Human ACE2 is among the most polymorphous genes, for which we identified 317 missense single-nucleotide variations (SNVs) from the dbSNP database. Three SNVs, E329G (rs143936283), M82I (rs267606406) and K26R (rs4646116), had a significant reduction in binding free energy, which indicated higher binding affinity than wild-type ACE2 and greater susceptibility to SARS-CoV-2 infection for people with them. Three other SNVs, D355N (rs961360700), E37K (rs146676783) and I21T (rs1244687367), had a significant increase in binding free energy, which indicated lower binding affinity and reduced susceptibility to SARS-CoV-2 infection.


2021 ◽  
Vol 9 ◽  
Author(s):  
Elisa Martino ◽  
Sara Chiarugi ◽  
Francesco Margheriti ◽  
Gianpiero Garau

Because of the key relevance of protein–protein interactions (PPI) in diseases, the modulation of protein-protein complexes is of relevant clinical significance. The successful design of binding compounds modulating PPI requires a detailed knowledge of the involved protein-protein system at molecular level, and investigation of the structural motifs that drive the association of the proteins at the recognition interface. These elements represent hot spots of the protein binding free energy, define the complex lifetime and possible modulation strategies. Here, we review the advanced technologies used to map the PPI involved in human diseases, to investigate the structure-function features of protein complexes, and to discover effective ligands that modulate the PPI for therapeutic intervention.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i735-i744
Author(s):  
Fuhao Zhang ◽  
Wenbo Shi ◽  
Jian Zhang ◽  
Min Zeng ◽  
Min Li ◽  
...  

Abstract Motivation Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods. Results We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein. Availability and implementation PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
E. Prabhu Raman ◽  
Thomas J. Paul ◽  
Ryan L. Hayes ◽  
Charles L. Brooks III

<p>Accurate predictions of changes to protein-ligand binding affinity in response to chemical modifications are of utility in small molecule lead optimization. Relative free energy perturbation (FEP) approaches are one of the most widely utilized for this goal, but involve significant computational cost, thus limiting their application to small sets of compounds. Lambda dynamics, also rigorously based on the principles of statistical mechanics, provides a more efficient alternative. In this paper, we describe the development of a workflow to setup, execute, and analyze Multi-Site Lambda Dynamics (MSLD) calculations run on GPUs with CHARMm implemented in BIOVIA Discovery Studio and Pipeline Pilot. The workflow establishes a framework for setting up simulation systems for exploratory screening of modifications to a lead compound, enabling the calculation of relative binding affinities of combinatorial libraries. To validate the workflow, a diverse dataset of congeneric ligands for seven proteins with experimental binding affinity data is examined. A protocol to automatically tailor fit biasing potentials iteratively to flatten the free energy landscape of any MSLD system is developed that enhances sampling and allows for efficient estimation of free energy differences. The protocol is first validated on a large number of ligand subsets that model diverse substituents, which shows accurate and reliable performance. The scalability of the workflow is also tested to screen more than a hundred ligands modeled in a single system, which also resulted in accurate predictions. With a cumulative sampling time of 150ns or less, the method results in average unsigned errors of under 1 kcal/mol in most cases for both small and large combinatorial libraries. For the multi-site systems examined, the method is estimated to be more than an order of magnitude more efficient than contemporary FEP applications. The results thus demonstrate the utility of the presented MSLD workflow to efficiently screen combinatorial libraries and explore chemical space around a lead compound, and thus are of utility in lead optimization.</p>


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


Sign in / Sign up

Export Citation Format

Share Document