scholarly journals Inferring interaction partners from protein sequences using mutual information

2018 ◽  
Author(s):  
Anne-Florence Bitbol

AbstractSpecific protein-protein interactions are crucial in most cellular processes. They enable multiprotein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are specific interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. This stands in contrast with structure prediction of proteins and of multiprotein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.Author summarySpecific protein-protein interactions are at the heart of most intra-cellular processes. Mapping these interactions is thus crucial to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. Systematic experimental identification of protein interaction partners is still challenging. However, a large and rapidly growing amount of sequence data is now available. Recently, algorithms have been proposed to identify which proteins interact from their sequences alone, thanks to the co-variation of the sequences of interacting proteins. These algorithms build upon inference methods that have been used with success to predict the three-dimensional structures of proteins and multi-protein complexes, and their focus is on the amino-acid residues that are in direct contact. Here, we propose a simpler method to identify which proteins interact among the paralogous proteins of two families, starting from their sequences alone. Our method relies on an approximate maximization of mutual information between the sequences of the two families, without specifically emphasizing the contacting residue pairs. We demonstrate that this method slightly outperforms the earlier one. This result highlights that partner prediction does not only rely on the identities and interactions of directly contacting amino-acids.

2016 ◽  
Author(s):  
Anne-Florence Bitbol ◽  
Robert S. Dwyer ◽  
Lucy J. Colwell ◽  
Ned S. Wingreen

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions.


2018 ◽  
Vol 25 (1) ◽  
pp. 5-21 ◽  
Author(s):  
Ylenia Cau ◽  
Daniela Valensin ◽  
Mattia Mori ◽  
Sara Draghi ◽  
Maurizio Botta

14-3-3 is a class of proteins able to interact with a multitude of targets by establishing protein-protein interactions (PPIs). They are usually found in all eukaryotes with a conserved secondary structure and high sequence homology among species. 14-3-3 proteins are involved in many physiological and pathological cellular processes either by triggering or interfering with the activity of specific protein partners. In the last years, the scientific community has collected many evidences on the role played by seven human 14-3-3 isoforms in cancer or neurodegenerative diseases. Indeed, these proteins regulate the molecular mechanisms associated to these diseases by interacting with (i) oncogenic and (ii) pro-apoptotic proteins and (iii) with proteins involved in Parkinson and Alzheimer diseases. The discovery of small molecule modulators of 14-3-3 PPIs could facilitate complete understanding of the physiological role of these proteins, and might offer valuable therapeutic approaches for these critical pathological states.


2018 ◽  
Vol 46 (6) ◽  
pp. 1593-1603 ◽  
Author(s):  
Chenkang Zheng ◽  
Patricia C. Dos Santos

Iron–sulfur (Fe–S) clusters are ubiquitous cofactors present in all domains of life. The chemistries catalyzed by these inorganic cofactors are diverse and their associated enzymes are involved in many cellular processes. Despite the wide range of structures reported for Fe–S clusters inserted into proteins, the biological synthesis of all Fe–S clusters starts with the assembly of simple units of 2Fe–2S and 4Fe–4S clusters. Several systems have been associated with the formation of Fe–S clusters in bacteria with varying phylogenetic origins and number of biosynthetic and regulatory components. All systems, however, construct Fe–S clusters through a similar biosynthetic scheme involving three main steps: (1) sulfur activation by a cysteine desulfurase, (2) cluster assembly by a scaffold protein, and (3) guided delivery of Fe–S units to either final acceptors or biosynthetic enzymes involved in the formation of complex metalloclusters. Another unifying feature on the biological formation of Fe–S clusters in bacteria is that these systems are tightly regulated by a network of protein interactions. Thus, the formation of transient protein complexes among biosynthetic components allows for the direct transfer of reactive sulfur and Fe–S intermediates preventing oxygen damage and reactions with non-physiological targets. Recent studies revealed the importance of reciprocal signature sequence motifs that enable specific protein–protein interactions and consequently guide the transactions between physiological donors and acceptors. Such findings provide insights into strategies used by bacteria to regulate the flow of reactive intermediates and provide protein barcodes to uncover yet-unidentified cellular components involved in Fe–S metabolism.


2019 ◽  
Author(s):  
Georgy Derevyanko ◽  
Guillaume Lamoureux

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.


Inorganics ◽  
2019 ◽  
Vol 7 (7) ◽  
pp. 85 ◽  
Author(s):  
Yap Shing Nim ◽  
Kam-Bo Wong

Maturation of urease involves post-translational insertion of nickel ions to form an active site with a carbamylated lysine ligand and is assisted by urease accessory proteins UreD, UreE, UreF and UreG. Here, we review our current understandings on how these urease accessory proteins facilitate the urease maturation. The urease maturation pathway involves the transfer of Ni2+ from UreE → UreG → UreF/UreD → urease. To avoid the release of the toxic metal to the cytoplasm, Ni2+ is transferred from one urease accessory protein to another through specific protein–protein interactions. One central theme depicts the role of guanosine triphosphate (GTP) binding/hydrolysis in regulating the binding/release of nickel ions and the formation of the protein complexes. The urease and [NiFe]-hydrogenase maturation pathways cross-talk with each other as UreE receives Ni2+ from hydrogenase maturation factor HypA. Finally, the druggability of the urease maturation pathway is reviewed.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 782 ◽  
Author(s):  
Virja Mehta ◽  
Laura Trinkle-Mulcahy

Protein-protein interactions (PPIs) underlie most, if not all, cellular functions. The comprehensive mapping of these complex networks of stable and transient associations thus remains a key goal, both for systems biology-based initiatives (where it can be combined with other ‘omics’ data to gain a better understanding of functional pathways and networks) and for focused biological studies. Despite the significant challenges of such an undertaking, major strides have been made over the past few years. They include improvements in the computation prediction of PPIs and the literature curation of low-throughput studies of specific protein complexes, but also an increase in the deposition of high-quality data from non-biased high-throughput experimental PPI mapping strategies into publicly available databases.


2019 ◽  
Author(s):  
Guillaume Marmier ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

AbstractDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among the paralogs of ubiquitous prokaryotic proteins families, starting from sequence data alone. Since DCA allows to infer the three-dimensional structure of protein complexes, its success in predicting protein-protein interactions could be mainly based on contacting residues coevolving to remain physicochemically complementary. However, interacting proteins often possess similar evolutionary histories, which also gives rise to correlations among their sequences. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involves phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that only share evolutionary history. It performs as well as methods explicitly based on sequence similarity, and even slightly better with large and accurate training sets. We further demonstrate the ability of these various methods to correctly predict pairings among actual paralogous proteins with genome proximity but no known direct physical interaction, which illustrates the importance of phylogenetic correlations in real data. However, for actually interacting and strongly coevolving proteins, DCA and mutual information outperform sequence similarity.Author summaryMany biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by coevolutionary methods based on Direct Coupling Analysis, which are a priori designed to detect the second type of signal. Using real sequence data, we show that in cases with shared evolutionary but without known physical interactions, both methods work with similar accuracy, while for physically interacting systems, methods based on correlated amino-acid usage outperform purely phylogenetic ones.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Andrea Mair ◽  
Shou-Ling Xu ◽  
Tess C Branon ◽  
Alice Y Ting ◽  
Dominique C Bergmann

Defining specific protein interactions and spatially or temporally restricted local proteomes improves our understanding of all cellular processes, but obtaining such data is challenging, especially for rare proteins, cell types, or events. Proximity labeling enables discovery of protein neighborhoods defining functional complexes and/or organellar protein compositions. Recent technological improvements, namely two highly active biotin ligase variants (TurboID and miniTurbo), allowed us to address two challenging questions in plants: (1) what are in vivo partners of a low abundant key developmental transcription factor and (2) what is the nuclear proteome of a rare cell type? Proteins identified with FAMA-TurboID include known interactors of this stomatal transcription factor and novel proteins that could facilitate its activator and repressor functions. Directing TurboID to stomatal nuclei enabled purification of cell type- and subcellular compartment-specific proteins. Broad tests of TurboID and miniTurbo in Arabidopsis and Nicotiana benthamiana and versatile vectors enable customization by plant researchers.


2019 ◽  
Vol 47 (W1) ◽  
pp. W331-W337 ◽  
Author(s):  
Ankit A Roy ◽  
Abhilesh S Dhawanjewar ◽  
Parichit Sharma ◽  
Gulzar Singh ◽  
M S Madhusudhan

Abstract Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein–protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Anna Vangone ◽  
Alexandre MJJ Bonvin

Almost all critical functions in cells rely on specific protein–protein interactions. Understanding these is therefore crucial in the investigation of biological systems. Despite all past efforts, we still lack a thorough understanding of the energetics of association of proteins. Here, we introduce a new and simple approach to predict binding affinity based on functional and structural features of the biological system, namely the network of interfacial contacts. We assess its performance against a protein–protein binding affinity benchmark and show that both experimental methods used for affinity measurements and conformational changes have a strong impact on prediction accuracy. Using a subset of complexes with reliable experimental binding affinities and combining our contacts and contact-types-based model with recent observations on the role of the non-interacting surface in protein–protein interactions, we reach a high prediction accuracy for such a diverse dataset outperforming all other tested methods.


Sign in / Sign up

Export Citation Format

Share Document