scholarly journals Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1090 ◽  
Author(s):  
Edwin Rodriguez Horta ◽  
Pierre Barrat-Charlaix ◽  
Martin Weigt

Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Kanchan Jha ◽  
Sriparna Saha

Abstract Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein–protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein–protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.


Molecules ◽  
2020 ◽  
Vol 25 (8) ◽  
pp. 1841 ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
Rui Gao

Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.


2006 ◽  
Vol 398 (1) ◽  
pp. 63-71 ◽  
Author(s):  
Prim de Bie ◽  
Bart van de Sluis ◽  
Ezra Burstein ◽  
Karen J. Duran ◽  
Ruud Berger ◽  
...  

COMMD [copper metabolism gene MURR1 (mouse U2af1-rs1 region 1) domain] proteins constitute a recently identified family of NF-κB (nuclear factor κB)-inhibiting proteins, characterized by the presence of the COMM domain. In the present paper, we report detailed investigation of the role of this protein family, and specifically the role of the COMM domain, in NF-κB signalling through characterization of protein–protein interactions involving COMMD proteins. The small ubiquitously expressed COMMD6 consists primarily of the COMM domain. Therefore COMMD1 and COMMD6 were analysed further as prototype members of the COMMD protein family. Using specific antisera, interaction between endogenous COMMD1 and COMMD6 is described. This interaction was verified by independent techniques, appeared to be direct and could be detected throughout the whole cell, including the nucleus. Both proteins inhibit TNF (tumour necrosis factor)-induced NF-κB activation in a non-synergistic manner. Mutation of the amino acid residues Trp24 and Pro41 in the COMM domain of COMMD6 completely abolished the inhibitory effect of COMMD6 on TNF-induced NF-κB activation, but this was not accompanied by loss of interaction with COMMD1, COMMD6 or the NF-κB subunit RelA. In contrast with COMMD1, COMMD6 does not bind to IκBα (inhibitory κBα), indicating that both proteins inhibit NF-κB in an overlapping, but not completely similar, manner. Taken together, these data support the significance of COMMD protein–protein interactions and provide new mechanistic insight into the function of this protein family in NF-κB signalling.


2021 ◽  
Author(s):  
Babu Sudhamalla ◽  
Anirban Roy ◽  
Soumen Barman ◽  
Jyotirmayee Padhan

The site-specific installation of light-activable crosslinker unnatural amino acids offers a powerful approach to trap transient protein-protein interactions both in vitro and in vivo. Herein, we engineer a bromodomain to...


2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Li ◽  
Zheng Wang ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang ◽  
...  

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.


2004 ◽  
Vol 24 (12) ◽  
pp. 5521-5533 ◽  
Author(s):  
David A. Mangus ◽  
Matthew C. Evans ◽  
Nathan S. Agrin ◽  
Mandy Smith ◽  
Preetam Gongidi ◽  
...  

ABSTRACT PAN, a yeast poly(A) nuclease, plays an important nuclear role in the posttranscriptional maturation of mRNA poly(A) tails. The activity of this enzyme is dependent on its Pan2p and Pan3p subunits, as well as the presence of poly(A)-binding protein (Pab1p). We have identified and characterized the associated network of factors controlling the maturation of mRNA poly(A) tails in yeast and defined its relevant protein-protein interactions. Pan3p, a positive regulator of PAN activity, interacts with Pab1p, thus providing substrate specificity for this nuclease. Pab1p also regulates poly(A) tail trimming by interacting with Pbp1p, a factor that appears to negatively regulate PAN. Pan3p and Pbp1p both interact with themselves and with the C terminus of Pab1p. However, the domains required for Pan3p and Pbp1p binding on Pab1p are distinct. Single amino acid changes that disrupt Pan3p interaction with Pab1p have been identified and define a binding pocket in helices 2 and 3 of Pab1p's carboxy terminus. The importance of these amino acids for Pab1p-Pan3p interaction, and poly(A) tail regulation, is underscored by experiments demonstrating that strains harboring substitutions in these residues accumulate mRNAs with long poly(A) tails in vivo.


2010 ◽  
Vol 84 (13) ◽  
pp. 6846-6860 ◽  
Author(s):  
Nadi T. Wickramasekera ◽  
Paula Traktman

ABSTRACT Poxvirus virions, whose outer membrane surrounds two lateral bodies and a core, contain at least 70 different proteins. The F18 phosphoprotein is one of the most abundant core components and is essential for the assembly of mature virions. We report here the results of a structure/function analysis in which the role of conserved cysteine residues, clusters of charged amino acids and clusters of hydrophobic/aromatic amino acids have been assessed. Taking advantage of a recombinant virus in which F18 expression is IPTG (isopropyl-β-d-thiogalactopyranoside) dependent, we developed a transient complementation assay to evaluate the ability of mutant alleles of F18 to support virion morphogenesis and/or to restore the production of infectious virus. We have also examined protein-protein interactions, comparing the ability of mutant and WT F18 proteins to interact with WT F18 and to interact with the viral A30 protein, another essential core component. We show that F18 associates with an A30-containing multiprotein complex in vivo in a manner that depends upon clusters of hydrophobic/aromatic residues in the N′ terminus of the F18 protein but that it is not required for the assembly of this complex. Finally, we confirmed that two PSSP motifs within F18 are the sites of phosphorylation by cellular proline-directed kinases in vitro and in vivo. Mutation of both of these phosphorylation sites has no apparent impact on virion morphogenesis but leads to the assembly of virions with significantly reduced infectivity.


2016 ◽  
Vol 113 (52) ◽  
pp. 15018-15023 ◽  
Author(s):  
Juan Rodriguez-Rivas ◽  
Simone Marsili ◽  
David Juan ◽  
Alfonso Valencia

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.


2016 ◽  
Author(s):  
Anne-Florence Bitbol ◽  
Robert S. Dwyer ◽  
Lucy J. Colwell ◽  
Ned S. Wingreen

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions.


Sign in / Sign up

Export Citation Format

Share Document