scholarly journals Conservation of co-evolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone

2016 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

AbstractProtein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue co-evolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that co-evolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a novel domain-centred protocol to study the interplay between residue co-evolution and structural conservation of protein-protein interfaces. We show that sequence-based co-evolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence, where standard homology modelling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic co-evolutionary analysis to the prediction of eukaryotic interfaces further illustrates the potential of this novel approach.Significance statementInteracting proteins tend to co-evolve through interdependent changes at the interaction interface. This phenomenon leads to patterns of coordinated mutations that can be exploited to systematically predict contacts between interacting proteins in prokaryotes. We explore the hypothesis that co-evolving contacts at protein interfaces are preferentially conserved through long evolutionary periods. We demonstrate that co-evolving residues in prokaryotes identify inter-protein contacts that are particularly well conserved in the corresponding structure of their eukaryotic homologues. Therefore, these contacts have likely been important to maintain protein-protein interactions during evolution. We show that this property can be used to reliably predict interacting residues between eukaryotic proteins with homologues in prokaryotes even if they are very distantly related in sequence.


2016 ◽  
Vol 113 (52) ◽  
pp. 15018-15023 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.



2021 ◽  
Author(s):  
Patrick Bryant ◽  
Gabriele Pozzati ◽  
Arne Elofsson

Predicting the structure of single-chain proteins is now close to being a solved problem due to the recent achievement of AlphaFold2 (AF2). However, predicting the structure of interacting protein chains is still a challenge. Here, we utilise AF2 to optimise a protocol for predicting the structure of heterodimeric protein complexes using only sequence information. We find that using the default AF2 protocol, 32% of the models in the Dockground test set can be modelled accurately. By tuning the input alignment and identifying the best model, we adjusted the performance to 43%. Our protocol uses MSAs generated by AF2 and MSAs paired on the organism level generated with HHblits. In a more extensive, more realistic, independent test set, the accuracy is 59%. In comparison, the alternative fold-and-dock method RoseTTAFold is only successful in 10% of the cases on this set and traditional docking methods 22%. However, for the traditional method, the performance would be lower if the bound form of both monomers was not known. The success is higher for bacterial protein pairs, pairs with large interaction areas consisting of helices or sheets, and many homologous sequences. We can distinguish acceptable (DockQ>0.23) from incorrect models with an AUC of 0.84 on the test set by analysing the predicted interfaces. At an error rate of 1%, 13% are acceptable (at a 10% error rate, 40% of the models are acceptable). All scripts and tools to run our protocol are freely available at: https://gitlab.com/ElofssonLab/FoldDock.





2021 ◽  
Author(s):  
Patrick Bryant ◽  
Gabriele Pozzati ◽  
Arne Elofsson

Abstract Predicting the structure of interacting protein chains is fundamental for understanding the function of proteins. Here, we examine the use of AlphaFold2 (AF2) for predicting the structure of heterodimeric protein complexes. We find that using the default AF2 protocol, 44% of the models in a test set can be predicted accurately. However, by optimising the multiple sequence alignment, we can increase the accuracy to 59%. In comparison, the alternative fold-and-dock method RoseTTAFold is only successful in 10% of the cases on this set, template-based docking 35% and traditional docking methods 22%. We can distinguish acceptable (DockQ>0.23) from incorrect models with an AUC of 0.85 on the test set by analysing the predicted interfaces. The success is higher for bacterial protein pairs, pairs with large interaction areas consisting of helices or sheets, and many homologous sequences. Further, we test the possibility to distinguish interacting from non-interacting proteins and find that by analysing the predicted interfaces, we can separate truly interacting from non-interacting proteins with an AUC of 0.82 in the ROC curve, compared to 0.76 with a recently published method. In addition, when using a more realistic negative set, including mammalian proteins, the identification rate remains (AUC=0.83), resulting in that 27% of interactions can be identified at a 1% FPR. All scripts and tools to run our protocol are freely available at: https://gitlab.com/ElofssonLab/FoldDock.



2020 ◽  
Author(s):  
Yumeng Yan ◽  
Sheng-You Huang

AbstractProtein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intraprotein contact prediction, achieving an unusual high accuracy in recent CASP structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments (MSA) of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo, by integrating evolutionary coupling, sequence conservation, distance map, docking pattern, and physic-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-CAPRI targets. It was shown that DeepHomo achieved a high accuracy of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis (DCA) and machine learning (ML)-based approaches. Integrating predicted contacts into protein docking with blindly predicted monomer structures also significantly improved the docking accuracy. The present study demonstrated the success of DeepHomo in inter-protein contact prediction. It is anticipated that DeepHomo will have a far-reaching implication in the inter-protein contact and structure prediction for protein-protein interactions.



2021 ◽  
Author(s):  
Mu Gao ◽  
Davi Nakajima An ◽  
Jerry M Parks ◽  
Jeffrey Skolnick

Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Very recently, AlphaFold2 has been shown to be remarkably accurate for predicting the atomic structures of individual proteins. Here, we demonstrate that the same neural network models developed for AlphaFold2 can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches that require paired multiple sequence alignments, our method, AF2Complex, works without using such paired alignments. It achieves higher accuracy than complex strategies that combine AlphaFold2 and protein-protein docking. New metrics are then introduced for predicting direct protein-protein interactions between arbitrary protein pairs. The approach is successfully validated on some challenging CASP14 multimeric targets, a small but appropriate benchmark set, and the E. coli proteome. Lastly, using the cytochrome c biogenesis system as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.



2003 ◽  
Vol 4 (4) ◽  
pp. 424-427 ◽  
Author(s):  
Alfonso Valencia

Multiple sequence alignments have much to offer to the understanding of protein structure, evolution and function. We are developing approaches to use this information in predicting protein-binding specificity, intra-protein and protein-protein interactions, and in reconstructing protein interaction networks.



Toxins ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 290
Author(s):  
Caterina Peggion ◽  
Fiorella Tonello

Snake venom phospholipases A2 (PLA2s) have sequences and structures very similar to those of mammalian group I and II secretory PLA2s, but they possess many toxic properties, ranging from the inhibition of coagulation to the blockage of nerve transmission, and the induction of muscle necrosis. The biological properties of these proteins are not only due to their enzymatic activity, but also to protein–protein interactions which are still unidentified. Here, we compare sequence alignments of snake venom and mammalian PLA2s, grouped according to their structure and biological activity, looking for differences that can justify their different behavior. This bioinformatics analysis has evidenced three distinct regions, two central and one C-terminal, having amino acid compositions that distinguish the different categories of PLA2s. In these regions, we identified short linear motifs (SLiMs), peptide modules involved in protein–protein interactions, conserved in mammalian and not in snake venom PLA2s, or vice versa. The different content in the SLiMs of snake venom with respect to mammalian PLA2s may result in the formation of protein membrane complexes having a toxic activity, or in the formation of complexes whose activity cannot be blocked due to the lack of switches in the toxic PLA2s, as the motif recognized by the prolyl isomerase Pin1.



2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.



Sign in / Sign up

Export Citation Format

Share Document