UEP: an open-source and fast classifier for predicting the impact of mutations in protein–protein complexes

Author(s):  
Pep Amengual-Rigo ◽  
Juan Fernández-Recio ◽  
Victor Guallar

Abstract Motivation Single protein residue mutations may reshape the binding affinity of protein–protein interactions. Therefore, predicting its effects is of great interest in biotechnology and biomedicine. Unfortunately, the availability of experimental data on binding affinity changes upon mutation is limited, which hampers the development of new and more precise algorithms. Here, we propose UEP, a classifier for predicting beneficial and detrimental mutations in protein–protein complexes trained on interactome data. Results Regardless of the simplicity of the UEP algorithm, which is based on a simple three-body contact potential derived from interactome data, we report competitive results with the gold standard methods in this field with the advantage of being faster in terms of computational time. Moreover, we propose a consensus selection procedure by involving the combination of three predictors that showed higher classification accuracy in our benchmark: UEP, pyDock and EvoEF1/FoldX. Overall, we demonstrate that the analysis of interactome data allows predicting the impact of protein–protein mutations using UEP, a fast and reliable open-source code. Availability and implementation UEP algorithm can be found at: https://github.com/pepamengual/UEP. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 36 (7) ◽  
pp. 2284-2285 ◽  
Author(s):  
Miguel Romero-Durana ◽  
Brian Jiménez-García ◽  
Juan Fernández-Recio

Abstract Motivation Protein–protein interactions are key to understand biological processes at the molecular level. As a complement to experimental characterization of protein interactions, computational docking methods have become useful tools for the structural and energetics modeling of protein–protein complexes. A key aspect of such algorithms is the use of scoring functions to evaluate the generated docking poses and try to identify the best models. When the scoring functions are based on energetic considerations, they can help not only to provide a reliable structural model for the complex, but also to describe energetic aspects of the interaction. This is the case of the scoring function used in pyDock, a combination of electrostatics, desolvation and van der Waals energy terms. Its correlation with experimental binding affinity values of protein–protein complexes was explored in the past, but the per-residue decomposition of the docking energy was never systematically analyzed. Results Here, we present pyDockEneRes (pyDock Energy per-Residue), a web server that provides pyDock docking energy partitioned at the residue level, giving a much more detailed description of the docking energy landscape. Additionally, pyDockEneRes computes the contribution to the docking energy of the side-chain atoms. This fast approach can be applied to characterize a complex structure in order to identify energetically relevant residues (hot-spots) and estimate binding affinity changes upon mutation to alanine. Availability and implementation The server does not require registration by the user and is freely accessible for academics at https://life.bsc.es/pid/pydockeneres. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 118 (6) ◽  
pp. e2014345118
Author(s):  
Diana Ascencio ◽  
Guillaume Diss ◽  
Isabelle Gagnon-Arsenault ◽  
Alexandre K. Dubé ◽  
Alexander DeLuna ◽  
...  

Gene duplication is ubiquitous and a major driver of phenotypic diversity across the tree of life, but its immediate consequences are not fully understood. Deleterious effects would decrease the probability of retention of duplicates and prevent their contribution to long-term evolution. One possible detrimental effect of duplication is the perturbation of the stoichiometry of protein complexes. Here, we measured the fitness effects of the duplication of 899 essential genes in the budding yeast using high-resolution competition assays. At least 10% of genes caused a fitness disadvantage when duplicated. Intriguingly, the duplication of most protein complex subunits had small to nondetectable effects on fitness, with few exceptions. We selected four complexes with subunits that had an impact on fitness when duplicated and measured the impact of individual gene duplications on their protein–protein interactions. We found that very few duplications affect both fitness and interactions. Furthermore, large complexes such as the 26S proteasome are protected from gene duplication by attenuation of protein abundance. Regulatory mechanisms that maintain the stoichiometric balance of protein complexes may protect from the immediate effects of gene duplication. Our results show that a better understanding of protein regulation and assembly in complexes is required for the refinement of current models of gene duplication.


2017 ◽  
Vol 61 (5) ◽  
pp. 505-516 ◽  
Author(s):  
Scott J. Hughes ◽  
Alessio Ciulli

Molecular glues and bivalent inducers of protein degradation (also known as PROTACs) represent a fascinating new modality in pharmacotherapeutics: the potential to knockdown previously thought ‘undruggable’ targets at sub-stoichiometric concentrations in ways not possible using conventional inhibitors. Mounting evidence suggests these chemical agents, in concert with their target proteins, can be modelled as three-body binding equilibria that can exhibit significant cooperativity as a result of specific ligand-induced molecular recognition. Despite this, many existing drug design and optimization regimens still fixate on binary target engagement, in part due to limited structural data on ternary complexes. Recent crystal structures of protein complexes mediated by degrader molecules, including the first PROTAC ternary complex, underscore the importance of protein–protein interactions and intramolecular contacts to the mode of action of this class of compounds. These discoveries have opened the door to a new paradigm for structure-guided drug design: borrowing surface area and molecular recognition from nature to elicit cellular signalling.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Anna Vangone ◽  
Alexandre MJJ Bonvin

Almost all critical functions in cells rely on specific protein–protein interactions. Understanding these is therefore crucial in the investigation of biological systems. Despite all past efforts, we still lack a thorough understanding of the energetics of association of proteins. Here, we introduce a new and simple approach to predict binding affinity based on functional and structural features of the biological system, namely the network of interfacial contacts. We assess its performance against a protein–protein binding affinity benchmark and show that both experimental methods used for affinity measurements and conformational changes have a strong impact on prediction accuracy. Using a subset of complexes with reliable experimental binding affinities and combining our contacts and contact-types-based model with recent observations on the role of the non-interacting surface in protein–protein interactions, we reach a high prediction accuracy for such a diverse dataset outperforming all other tested methods.


2020 ◽  
Vol 36 (8) ◽  
pp. 2458-2465 ◽  
Author(s):  
Isak Johansson-Åkhe ◽  
Claudio Mirabello ◽  
Björn Wallner

Abstract Motivation Interactions between proteins and peptides or peptide-like intrinsically disordered regions are involved in many important biological processes, such as gene expression and cell life-cycle regulation. Experimentally determining the structure of such interactions is time-consuming and difficult because of the inherent flexibility of the peptide ligand. Although several prediction-methods exist, most are limited in performance or availability. Results InterPep2 is a freely available method for predicting the structure of peptide–protein interactions. Improved performance is obtained by using templates from both peptide–protein and regular protein–protein interactions, and by a random forest trained to predict the DockQ-score for a given template using sequence and structural features. When tested on 252 bound peptide–protein complexes from structures deposited after the complexes used in the construction of the training and templates sets of InterPep2, InterPep2-Refined correctly positioned 67 peptides within 4.0 Å LRMSD among top10, similar to another state-of-the-art template-based method which positioned 54 peptides correctly. However, InterPep2 displays a superior ability to evaluate the quality of its own predictions. On a previously established set of 27 non-redundant unbound-to-bound peptide–protein complexes, InterPep2 performs on-par with leading methods. The extended InterPep2-Refined protocol managed to correctly model 15 of these complexes within 4.0 Å LRMSD among top10, without using templates from homologs. In addition, combining the template-based predictions from InterPep2 with ab initio predictions from PIPER-FlexPepDock resulted in 22% more near-native predictions compared to the best single method (22 versus 18). Availability and implementation The program is available from: http://wallnerlab.org/InterPep2. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Diana Ascencio ◽  
Guillaume Diss ◽  
Isabelle Gagnon-Arsenault ◽  
Alexandre K Dubé ◽  
Alexander DeLuna ◽  
...  

AbstractGene duplication is ubiquitous and a major driver of phenotypic diversity across the tree of life, but its immediate consequences are not fully understood. Deleterious effects would decrease the probability of retention of duplicates and prevent their contribution to long term evolution. One possible detrimental effect of duplication is the perturbation of the stoichiometry of protein complexes. Here, we measured the fitness effects of the duplication of 899 essential genes in the budding yeast using high-resolution competition assays. At least ten percent of genes caused a fitness disadvantage when duplicated. Intriguingly, the duplication of most protein complex subunits had small to non-detectable effects on fitness, with few exceptions. We selected four complexes with subunits that had an impact on fitness when duplicated and measured the impact of individual gene duplications on their protein-protein interactions. We found that very few duplications affect both fitness and interactions. Furthermore, large complexes such as the 26S proteasome are protected from gene duplication by attenuation of protein abundance. Regulatory mechanisms that maintain the stoichiometric balance of protein complexes may protect from the immediate effects of gene duplication. Our results show that a better understanding of protein regulation and assembly in complexes is required for the refinement of current models of gene duplication.


2017 ◽  
Author(s):  
Oriol Pich i Rosello ◽  
Anna V. Vlasova ◽  
Polina A. Shichkova ◽  
Yuri Markov ◽  
Peter K. Vlasov ◽  
...  

Human genetic variability is thought to account for a substantial fraction of individual biochemical characteristics – in biomedical sense, of individual drug response. However, only a handful of human genetic variants have been linked to medication outcomes. Here, we combine data on drug-protein interactions and human genome sequences to assess the impact of human variation on their binding affinity. Using data from the complexes of FDA-drugs and drug-like compounds, we predict SNPs substantially affecting the protein-ligand binding affinities. We estimate that an average individual carries ~6 SNPs affecting ~5 different FDA-approved drugs from among all of the approved compounds. SNPs affecting drug-protein binding affinity have low frequency in the population indicating that the genetic component for many ADEs may be highly personalized with each individual carrying a unique set of relevant SNPs. The reduction of ADEs, therefore, may primarily rely on the application of computational genome analysis in the clinic rather than the experimental study of common SNPs.


2019 ◽  
Vol 36 (8) ◽  
pp. 2429-2437 ◽  
Author(s):  
Xiaoqiang Huang ◽  
Wei Zheng ◽  
Robin Pearce ◽  
Yang Zhang

Abstract Motivation Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. Results We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementation Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Gen Li ◽  
Swagata Pahari ◽  
Adithya Krishna Murthy ◽  
Siqi Liang ◽  
Robert Fragoza ◽  
...  

Abstract Motivation Vast majority of human genetic disorders are associated with mutations that affect protein–protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein–protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available. Results Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37–0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein–protein interactions. Availability and implementation SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (14) ◽  
pp. 4208-4210
Author(s):  
Damiano Cianferoni ◽  
Leandro G Radusky ◽  
Sarah A Head ◽  
Luis Serrano ◽  
Javier Delgado

Abstract Summary Accurate 3D modelling of protein–protein interactions (PPI) is essential to compensate for the absence of experimentally determined complex structures. Here, we present a new set of commands within the ModelX toolsuite capable of generating atomic-level protein complexes suitable for interface design. Among these commands, the new tool ProteinFishing proposes known and/or putative alternative 3D PPI for a given protein complex. The algorithm exploits backbone compatibility of protein fragments to generate mutually exclusive protein interfaces that are quickly evaluated with a knowledge-based statistical force field. Using interleukin-10-R2 co-crystalized with interferon-lambda-3, and a database of X-ray structures containing interleukin-10, this algorithm was able to generate interleukin-10-R2/interleukin-10 structural models in agreement with experimental data. Availability and implementation ProteinFishing is a portable command-line tool included in the ModelX toolsuite, written in C++, that makes use of an SQL (tested for MySQL and MariaDB) relational database delivered with a template SQL dump called FishXDB. FishXDB contains the empty tables of ModelX fragments and the data used by the embedded statistical force field. ProteinFishing is compiled for Linux-64bit, MacOS-64bit and Windows-32bit operating systems. This software is a proprietary license and is distributed as an executable with its correspondent database dumps. It can be downloaded publicly at http://modelx.crg.es/. Licenses are freely available for academic users after registration on the website and are available under commercial license for for-profit organizations or companies. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document