scholarly journals Statistical physics of interacting proteins: impact of dataset size and quality assessed in synthetic sequences

2019 ◽  
Author(s):  
Carlos A. Gandarilla-Pérez ◽  
Pierre Mergny ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

Identifying protein-protein interactions is crucial for a systems-level understanding of the cell. Recently, algorithms based on inverse statistical physics, e.g. Direct Coupling Analysis (DCA), have allowed to use evolutionarily related sequences to address two conceptually related inference tasks: finding pairs of interacting proteins, and identifying pairs of residues which form contacts between interacting proteins. Here we address two underlying questions: How are the performances of both inference tasks related? How does performance depend on dataset size and the quality? To this end, we formalize both tasks using Ising models defined over stochastic block models, with individual blocks representing single proteins, and inter-block couplings protein-protein interactions; controlled synthetic sequence data are generated by Monte-Carlo simulations. We show that DCA is able to address both inference tasks accurately when sufficiently large training sets of known interaction partners are available, and that an iterative pairing algorithm (IPA) allows to make predictions even without a training set. Noise in the training data deteriorates performance. In both tasks we find a quadratic scaling relating dataset quality and size that is consistent with noise adding in square-root fashion and signal adding linearly when increasing the dataset. This implies that it is generally good to incorporate more data even if its quality is imperfect, thereby shedding light on the empirically observed performance of DCA applied to natural protein sequences.

2019 ◽  
Author(s):  
Guillaume Marmier ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

AbstractDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among the paralogs of ubiquitous prokaryotic proteins families, starting from sequence data alone. Since DCA allows to infer the three-dimensional structure of protein complexes, its success in predicting protein-protein interactions could be mainly based on contacting residues coevolving to remain physicochemically complementary. However, interacting proteins often possess similar evolutionary histories, which also gives rise to correlations among their sequences. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involves phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that only share evolutionary history. It performs as well as methods explicitly based on sequence similarity, and even slightly better with large and accurate training sets. We further demonstrate the ability of these various methods to correctly predict pairings among actual paralogous proteins with genome proximity but no known direct physical interaction, which illustrates the importance of phylogenetic correlations in real data. However, for actually interacting and strongly coevolving proteins, DCA and mutual information outperform sequence similarity.Author summaryMany biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by coevolutionary methods based on Direct Coupling Analysis, which are a priori designed to detect the second type of signal. Using real sequence data, we show that in cases with shared evolutionary but without known physical interactions, both methods work with similar accuracy, while for physically interacting systems, methods based on correlated amino-acid usage outperform purely phylogenetic ones.


2016 ◽  
Author(s):  
Anne-Florence Bitbol ◽  
Robert S. Dwyer ◽  
Lucy J. Colwell ◽  
Ned S. Wingreen

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions.


2018 ◽  
Author(s):  
Anne-Florence Bitbol

AbstractSpecific protein-protein interactions are crucial in most cellular processes. They enable multiprotein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are specific interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. This stands in contrast with structure prediction of proteins and of multiprotein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.Author summarySpecific protein-protein interactions are at the heart of most intra-cellular processes. Mapping these interactions is thus crucial to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. Systematic experimental identification of protein interaction partners is still challenging. However, a large and rapidly growing amount of sequence data is now available. Recently, algorithms have been proposed to identify which proteins interact from their sequences alone, thanks to the co-variation of the sequences of interacting proteins. These algorithms build upon inference methods that have been used with success to predict the three-dimensional structures of proteins and multi-protein complexes, and their focus is on the amino-acid residues that are in direct contact. Here, we propose a simpler method to identify which proteins interact among the paralogous proteins of two families, starting from their sequences alone. Our method relies on an approximate maximization of mutual information between the sequences of the two families, without specifically emphasizing the contacting residue pairs. We demonstrate that this method slightly outperforms the earlier one. This result highlights that partner prediction does not only rely on the identities and interactions of directly contacting amino-acids.


2016 ◽  
Vol 113 (43) ◽  
pp. 12186-12191 ◽  
Author(s):  
Thomas Gueudré ◽  
Carlo Baldassi ◽  
Marco Zamparo ◽  
Martin Weigt ◽  
Andrea Pagnani

Understanding protein−protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein−protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue−residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue−residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.


2021 ◽  
Author(s):  
Andonis Gerardos ◽  
Nicola Dietler ◽  
Anne-Florence Bitbol

Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural dataset, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.


Author(s):  
Oruganty Krishnadev ◽  
Shveta Bisht ◽  
Narayanaswamy Srinivasan

The genomes of many human pathogens have been sequenced but the protein-protein interactions across a pathogen and human are still poorly understood. The authors apply a simple homology-based method to predict protein-protein interactions between human host and two mycobacterial organisms viz., M.tuberculosis and M.leprae. They focused on secreted proteins of pathogens and cellular membrane proteins to restrict to uncovering biologically significant and feasible interactions. Predicted interactions include five mycobacterial proteins of yet unknown function, thus suggesting a role for these proteins in pathogenesis. The authors predict interaction partners for secreted mycobacterial antigens such as MPT70, serine proteases and other proteins interacting with human proteins, such as toll-like receptors, ras signalling proteins and immune maintenance proteins, that are implicated in pathogenesis. These results suggest that the list of predicted interactions is suitable for further analysis and forms a useful step in the understanding of pathogenesis of these mycobacterial organisms.


2020 ◽  
Vol 19 (7) ◽  
pp. 1070-1075 ◽  
Author(s):  
Katrina Meyer ◽  
Matthias Selbach

Protein-protein interactions are often mediated by short linear motifs (SLiMs) that are located in intrinsically disordered regions (IDRs) of proteins. Interactions mediated by SLiMs are notoriously difficult to study, and many functionally relevant interactions likely remain to be uncovered. Recently, pull-downs with synthetic peptides in combination with quantitative mass spectrometry emerged as a powerful screening approach to study protein-protein interactions mediated by SLiMs. Specifically, arrays of synthetic peptides immobilized on cellulose membranes provide a scalable means to identify the interaction partners of many peptides in parallel. In this minireview we briefly highlight the relevance of SLiMs for protein-protein interactions, outline existing screening technologies, discuss unique advantages of peptide-based interaction screens and provide practical suggestions for setting up such peptide-based screens.


2020 ◽  
Vol 16 ◽  
pp. 2505-2522
Author(s):  
Peter Bayer ◽  
Anja Matena ◽  
Christine Beuck

As one of the few analytical methods that offer atomic resolution, NMR spectroscopy is a valuable tool to study the interaction of proteins with their interaction partners, both biomolecules and synthetic ligands. In recent years, the focus in chemistry has kept expanding from targeting small binding pockets in proteins to recognizing patches on protein surfaces, mostly via supramolecular chemistry, with the goal to modulate protein–protein interactions. Here we present NMR methods that have been applied to characterize these molecular interactions and discuss the challenges of this endeavor.


Cells ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 2008 ◽  
Author(s):  
Nicole Wesch ◽  
Vladimir Kirkin ◽  
Vladimir V. Rogov

Autophagy is a common name for a number of catabolic processes, which keep the cellular homeostasis by removing damaged and dysfunctional intracellular components. Impairment or misbalance of autophagy can lead to various diseases, such as neurodegeneration, infection diseases, and cancer. A central axis of autophagy is formed along the interactions of autophagy modifiers (Atg8-family proteins) with a variety of their cellular counter partners. Besides autophagy, Atg8-proteins participate in many other pathways, among which membrane trafficking and neuronal signaling are the most known. Despite the fact that autophagy modifiers are well-studied, as the small globular proteins show similarity to ubiquitin on a structural level, the mechanism of their interactions are still not completely understood. A thorough analysis and classification of all known mechanisms of Atg8-protein interactions could shed light on their functioning and connect the pathways involving Atg8-proteins. In this review, we present our views of the key features of the Atg8-proteins and describe the basic principles of their recognition and binding by interaction partners. We discuss affinity and selectivity of their interactions as well as provide perspectives for discovery of new Atg8-interacting proteins and therapeutic approaches to tackle major human diseases.


Sign in / Sign up

Export Citation Format

Share Document