scholarly journals A Reproducibility Analysis-based Statistical Framework for Residue-Residue Evolutionary Coupling Detection

2021 ◽  
Author(s):  
Yunda Si ◽  
Chengfei Yan

AbstractDirect coupling analysis (DCA) has been widely used to predict residue-residue contacts to assist protein/RNA structure and interaction prediction. However, effectively selecting residue pairs for contact prediction according to the result of DCA is a non-trivial task, since the number of highly predictive residue pairs and the coupling scores obtained from DCA are highly dependent on the number and the length of the homologous sequences forming the multiple sequence alignment, the detailed settings of the DCA algorithm, the functional characteristics of the macromolecule, etc. In this study, we present a general statistical framework for selecting predictive residue pairs through significant evolutionary coupling detection, referred to as IDR-DCA, which is based on reproducibility analysis of the coupling scores from replicated DCA. IDR-DCA was applied to select residue pairs for contact prediction for 150 proteins, 30 protein-protein interactions and 36 RNAs, in which we applied three widely used DCA software to perform the DCA. We show that with the application of IDR-DCA, the predictive residue pairs can be effectively selected through a universal threshold independent on the DCA software.

2018 ◽  
Author(s):  
Adam J. Hockenberry ◽  
Claus O. Wilke

Patterns of amino acid covariation in large protein sequence alignments can inform the prediction of de novo protein structures, binding interfaces, and mutational effects. While algorithms that detect these so-called evolutionary couplings between residues have proven useful for practical applications, less is known about how and why these methods perform so well, and what insights into biological processes can be gained from their application. Evolutionary coupling algorithms are commonly benchmarked by comparison to true structural contacts derived from solved protein structures. However, the methods used to determine true structural contacts are not standardized and different definitions of structural contacts may have important consequences for interpreting the results from evolutionary coupling analyses and understanding their overall utility. Here, we show that evolutionary coupling analyses are significantly more likely to identify structural contacts between side-chain atoms than between backbone atoms. We use both simulations and empirical analyses to highlight that purely backbone-based definitions of true residue–residue contacts (i.e., based on the distance between Cα atoms) may underestimate the accuracy of evolutionary coupling algorithms by as much as 40% and that a commonly used reference point (Cβ atoms) underestimates the accuracy by 10–15%. These findings show that co-evolutionary outcomes differ according to which atoms participate in residue–residue interactions and suggest that accounting for different interaction types may lead to further improvements to contact-prediction methods.Significance StatementEvolutionary couplings between residues within a protein can provide valuable information about protein structures, protein-protein interactions, and the mutability of individual residues. However, the mechanistic factors that determine whether two residues will co-evolve remains unknown. We show that structural proximity by itself is not sufficient for co-evolution to occur between residues. Rather, evolutionary couplings between residues are specifically governed by interactions between side-chain atoms. By contrast, intramolecular contacts between atoms in the protein backbone display only a weak signature of evolutionary coupling. These findings highlight that different types of stabilizing contacts exist within protein structures and that these types have a differential impact on the evolution of protein structures that should be considered in co-evolutionary applications.


2020 ◽  
Author(s):  
Yumeng Yan ◽  
Sheng-You Huang

AbstractProtein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intraprotein contact prediction, achieving an unusual high accuracy in recent CASP structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments (MSA) of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo, by integrating evolutionary coupling, sequence conservation, distance map, docking pattern, and physic-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-CAPRI targets. It was shown that DeepHomo achieved a high accuracy of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis (DCA) and machine learning (ML)-based approaches. Integrating predicted contacts into protein docking with blindly predicted monomer structures also significantly improved the docking accuracy. The present study demonstrated the success of DeepHomo in inter-protein contact prediction. It is anticipated that DeepHomo will have a far-reaching implication in the inter-protein contact and structure prediction for protein-protein interactions.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Javier A. Iserte ◽  
Tamas Lazar ◽  
Silvio C. E. Tosatto ◽  
Peter Tompa ◽  
Cristina Marino-Buslje

Abstract Intrinsically disordered proteins/regions (IDPs/IDRs) are crucial components of the cell, they are highly abundant and participate ubiquitously in a wide range of biological functions, such as regulatory processes and cell signaling. Many of their important functions rely on protein interactions, by which they trigger or modulate different pathways. Sequence covariation, a powerful tool for protein contact prediction, has been applied successfully to predict protein structure and to identify protein–protein interactions mostly of globular proteins. IDPs/IDRs also mediate a plethora of protein–protein interactions, highlighting the importance of addressing sequence covariation-based inter-protein contact prediction of this class of proteins. Despite their importance, a systematic approach to analyze the covariation phenomena of intrinsically disordered proteins and their complexes is still missing. Here we carry out a comprehensive critical assessment of coevolution-based contact prediction in IDP/IDR complexes and detail the challenges and possible limitations that emerge from their analysis. We found that the coevolutionary signal is faint in most of the complexes of disordered proteins but positively correlates with the interface size and binding affinity between partners. In addition, we discuss the state-of-art methodology by biological interpretation of the results, formulate evaluation guidelines and suggest future directions of development to the field.


BMC Cancer ◽  
2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Konstantinos Karakostis ◽  
Robin Fåhraeus

Abstract Structured RNA regulatory motifs exist from the prebiotic stages of the RNA world to the more complex eukaryotic systems. In cases where a functional RNA structure is within the coding sequence a selective pressure drives a parallel co-evolution of the RNA structure and the encoded peptide domain. The p53-MDM2 axis, describing the interactions between the p53 tumor suppressor and the MDM2 E3 ubiquitin ligase, serves as particularly useful model revealing how secondary RNA structures have co-evolved along with corresponding interacting protein motifs, thus having an impact on protein – RNA and protein – protein interactions; and how such structures developed signal-dependent regulation in mammalian systems. The p53(BOX-I) RNA sequence binds the C-terminus of MDM2 and controls p53 synthesis while the encoded peptide domain binds MDM2 and controls p53 degradation. The BOX-I peptide domain is also located within p53 transcription activation domain. The folding of the p53 mRNA structure has evolved from temperature-regulated in pre-vertebrates to an ATM kinase signal-dependent pathway in mammalian cells. The protein – protein interaction evolved in vertebrates and became regulated by the same signaling pathway. At the same time the protein - RNA and protein - protein interactions evolved, the p53 trans-activation domain progressed to become integrated into a range of cellular pathways. We discuss how a single synonymous mutation in the BOX-1, the p53(L22 L), observed in a chronic lymphocyte leukaemia patient, prevents the activation of p53 following DNA damage. The concepts analysed and discussed in this review may serve as a conceptual mechanistic paradigm of the co-evolution and function of molecules having roles in cellular regulation, or the aetiology of genetic diseases and how synonymous mutations can affect the encoded protein.


2007 ◽  
Vol 81 (11) ◽  
pp. 5807-5818 ◽  
Author(s):  
Dustin T. Petrik ◽  
Kimberly P. Schmitt ◽  
Mark F. Stinski

ABSTRACT The functions of the human cytomegalovirus (HCMV) IE86 protein are paradoxical, as it can both activate and repress viral gene expression through interaction with the promoter region. Although the mechanism for these functions is not clearly defined, it appears that a combination of direct DNA binding and protein-protein interactions is involved. Multiple sequence alignment of several HCMV IE86 homologs reveals that the amino acids 534LPIYE538 are conserved between all primate and nonprimate CMVs. In the context of a bacterial artificial chromosome (BAC), mutation of both P535 and Y537 to alanines (P535A/Y537A) results in a nonviable BAC. The defective HCMV BAC does not undergo DNA replication, although the P535A/Y537A mutant IE86 protein appears to be stably expressed. The P535A/Y537A mutant IE86 protein is able to negatively autoregulate transcription from the major immediate-early (MIE) promoter and was recruited to the MIE promoter in a chromatin immunoprecipitation (ChIP) assay. However, the P535A/Y537A mutant IE86 protein was unable to transactivate early viral genes and was not recruited to the early viral UL4 and UL112 promoters in a ChIP assay. From these data, we conclude that the transactivation and repressive functions of the HCMV IE86 protein can be separated and must occur through independent mechanisms.


2016 ◽  
Vol 113 (52) ◽  
pp. 15018-15023 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.


2014 ◽  
Vol 07 (01) ◽  
pp. 28-37
Author(s):  
Tu Kien T. Le ◽  
Osamu Hirose ◽  
Vu Anh Tran ◽  
Thammakorn Saethang ◽  
Lan Anh T. Nguyen ◽  
...  

2019 ◽  
Author(s):  
Guillaume Marmier ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

AbstractDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among the paralogs of ubiquitous prokaryotic proteins families, starting from sequence data alone. Since DCA allows to infer the three-dimensional structure of protein complexes, its success in predicting protein-protein interactions could be mainly based on contacting residues coevolving to remain physicochemically complementary. However, interacting proteins often possess similar evolutionary histories, which also gives rise to correlations among their sequences. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involves phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that only share evolutionary history. It performs as well as methods explicitly based on sequence similarity, and even slightly better with large and accurate training sets. We further demonstrate the ability of these various methods to correctly predict pairings among actual paralogous proteins with genome proximity but no known direct physical interaction, which illustrates the importance of phylogenetic correlations in real data. However, for actually interacting and strongly coevolving proteins, DCA and mutual information outperform sequence similarity.Author summaryMany biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by coevolutionary methods based on Direct Coupling Analysis, which are a priori designed to detect the second type of signal. Using real sequence data, we show that in cases with shared evolutionary but without known physical interactions, both methods work with similar accuracy, while for physically interacting systems, methods based on correlated amino-acid usage outperform purely phylogenetic ones.


2019 ◽  
Vol 47 (W1) ◽  
pp. W331-W337 ◽  
Author(s):  
Ankit A Roy ◽  
Abhilesh S Dhawanjewar ◽  
Parichit Sharma ◽  
Gulzar Singh ◽  
M S Madhusudhan

Abstract Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein–protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.


2021 ◽  
Author(s):  
Ziwei Xie ◽  
Jinbo Xu

Motivation: Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. Results: We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments (MSAs). Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54.35% on the homodimers and 51.56% on all the dimers, much higher than 30.43% obtained by the latest deep learning method DeepHomo on the homodimers and 14.69% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys.


Sign in / Sign up

Export Citation Format

Share Document