scholarly journals Predicting protein subcellular location using learned distributed representations from a protein-protein network

2019 ◽  
Author(s):  
Xiaoyong Pan ◽  
Lei Chen ◽  
Min Liu ◽  
Tao Huang ◽  
Yu-Dong Cai

AbstractFunctions of proteins are in general related to their subcellular locations. To identify the functions of a protein, we first need know where this protein is located. Interacting proteins tend to locate in the same subcellular location. Thus, it is imperative to take the protein-protein interactions into account for computational identification of protein subcellular locations.In this study, we present a deep learning-based method, node2loc, to predict protein subcellular location. node2loc first learns distributed representations of proteins in a protein-protein network using node2vec, which acquires representations from unlabeled data for downstream tasks. Then the learned representations are further fed into a recurrent neural network (RNN) to predict subcellular locations. Considering the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE) is applied to artificially boost subcellular locations with few proteins.We construct a benchmark dataset with 16 subcellular locations and evaluate node2loc on this dataset. node2loc yields a Matthews correlation coefficient (MCC) value of 0.812, which outperforms other baseline methods. The results demonstrate that the learned presentations from a protein-protein network have strong discriminate ability for classifying protein subcellular locations and the RNN is a more powerful classifier than traditional machine learning models. node2loc is freely available at https://github.com/xypan1232/node2loc.

2019 ◽  
Vol 14 (5) ◽  
pp. 406-421 ◽  
Author(s):  
Ting-He Zhang ◽  
Shao-Wu Zhang

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.


2019 ◽  
Vol 36 (6) ◽  
pp. 1908-1914 ◽  
Author(s):  
Ying-Ying Xu ◽  
Hong-Bin Shen ◽  
Robert F Murphy

Abstract Motivation Systematic and comprehensive analysis of protein subcellular location as a critical part of proteomics (‘location proteomics’) has been studied for many years, but annotating protein subcellular locations and understanding variation of the location patterns across various cell types and states is still challenging. Results In this work, we used immunohistochemistry images from the Human Protein Atlas as the source of subcellular location information, and built classification models for the complex protein spatial distribution in normal and cancerous tissues. The models can automatically estimate the fractions of protein in different subcellular locations, and can help to quantify the changes of protein distribution from normal to cancer tissues. In addition, we examined the extent to which different annotated protein pathways and complexes showed similarity in the locations of their member proteins, and then predicted new potential proteins for these networks. Availability and implementation The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/complexsubcellularpatterns. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Theodosios Theodosiou ◽  
Nikolaos Papanikolaou ◽  
Maria Savvaki ◽  
Giulia Bonetto ◽  
Stella Maxouri ◽  
...  

Abstract The in-depth study of protein–protein interactions (PPIs) is of key importance for understanding how cells operate. Therefore, in the past few years, many experimental as well as computational approaches have been developed for the identification and discovery of such interactions. Here, we present UniReD, a user-friendly, computational prediction tool which analyses biomedical literature in order to extract known protein associations and suggest undocumented ones. As a proof of concept, we demonstrate its usefulness by experimentally validating six predicted interactions and by benchmarking it against public databases of experimentally validated PPIs succeeding a high coverage. We believe that UniReD can become an important and intuitive resource for experimental biologists in their quest for finding novel associations within a protein network and a useful tool to complement experimental approaches (e.g. mass spectrometry) by producing sorted lists of candidate proteins for further experimental validation. UniReD is available at http://bioinformatics.med.uoc.gr/unired/


Author(s):  
João Botelho ◽  
Paulo Mascarenhas ◽  
José João Mendes ◽  
Vanessa Machado

Recent studies supported a clinical association between Parkinson’s Disease (PD) and periodontitis. Hence, investigating possible protein interactions between these two conditions is of interest. In this study, we conducted a protein-protein network interaction analysis with recognized genes encoding proteins for PD and periodontitis. Genes of interest were collected via GWAS database. Then, we conducted a protein interaction analysis using STRING database, with a highest confidence cut-off of 0.9. Our protein network casted a comprehensive analysis of potential protein-protein interactions between PD and periodontitis. This analysis may underpin valuable information for new candidate molecular mechanisms between PD and periodontitis and may serve new potential targets for research purposes. These results should be carefully interpreted giving the limitations of this approach.


2019 ◽  
Author(s):  
Guillaume Marmier ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

AbstractDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among the paralogs of ubiquitous prokaryotic proteins families, starting from sequence data alone. Since DCA allows to infer the three-dimensional structure of protein complexes, its success in predicting protein-protein interactions could be mainly based on contacting residues coevolving to remain physicochemically complementary. However, interacting proteins often possess similar evolutionary histories, which also gives rise to correlations among their sequences. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involves phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that only share evolutionary history. It performs as well as methods explicitly based on sequence similarity, and even slightly better with large and accurate training sets. We further demonstrate the ability of these various methods to correctly predict pairings among actual paralogous proteins with genome proximity but no known direct physical interaction, which illustrates the importance of phylogenetic correlations in real data. However, for actually interacting and strongly coevolving proteins, DCA and mutual information outperform sequence similarity.Author summaryMany biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by coevolutionary methods based on Direct Coupling Analysis, which are a priori designed to detect the second type of signal. Using real sequence data, we show that in cases with shared evolutionary but without known physical interactions, both methods work with similar accuracy, while for physically interacting systems, methods based on correlated amino-acid usage outperform purely phylogenetic ones.


Biomolecules ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1107
Author(s):  
Youjun Zhang ◽  
Alisdair R. Fernie

Protein–protein assemblies are highly prevalent in all living cells. Considerable evidence has recently accumulated suggesting that particularly transient association/dissociation of proteins represent an important means of regulation of metabolism. This is true not only in the cytosol and organelle matrices, but also at membrane surfaces where, for example, receptor complexes, as well as those of key metabolic pathways, are common. Transporters also frequently come up in lists of interacting proteins, for example, binding proteins that catalyze the production of their substrates or that act as relays within signal transduction cascades. In this review, we provide an update of technologies that are used in the study of such interactions with mitochondrial transport proteins, highlighting the difficulties that arise in their use for membrane proteins and discussing our current understanding of the biological function of such interactions.


Author(s):  
Piyali Chatterjee ◽  
Subhadip Basu ◽  
Mahantapas Kundu ◽  
Mita Nasipuri ◽  
Dariusz Plewczynski

AbstractProtein-protein interactions (PPI) control most of the biological processes in a living cell. In order to fully understand protein functions, a knowledge of protein-protein interactions is necessary. Prediction of PPI is challenging, especially when the three-dimensional structure of interacting partners is not known. Recently, a novel prediction method was proposed by exploiting physical interactions of constituent domains. We propose here a novel knowledge-based prediction method, namely PPI_SVM, which predicts interactions between two protein sequences by exploiting their domain information. We trained a two-class support vector machine on the benchmarking set of pairs of interacting proteins extracted from the Database of Interacting Proteins (DIP). The method considers all possible combinations of constituent domains between two protein sequences, unlike most of the existing approaches. Moreover, it deals with both single-domain proteins and multi domain proteins; therefore it can be applied to the whole proteome in high-throughput studies. Our machine learning classifier, following a brainstorming approach, achieves accuracy of 86%, with specificity of 95%, and sensitivity of 75%, which are better results than most previous methods that sacrifice recall values in order to boost the overall precision. Our method has on average better sensitivity combined with good selectivity on the benchmarking dataset. The PPI_SVM source code, train/test datasets and supplementary files are available freely in the public domain at: http://code.google.com/p/cmater-bioinfo/.


2010 ◽  
Vol 38 (4) ◽  
pp. 940-946 ◽  
Author(s):  
Parvez I. Haris

For most biophysical techniques, characterization of protein–protein interactions is challenging; this is especially true with methods that rely on a physical phenomenon that is common to both of the interacting proteins. Thus, for example, in IR spectroscopy, the carbonyl vibration (1600–1700 cm−1) associated with the amide bonds from both of the interacting proteins will overlap extensively, making the interpretation of spectral changes very complicated. Isotope-edited infrared spectroscopy, where one of the interacting proteins is uniformly labelled with 13C or 13C,15N has been introduced as a solution to this problem, enabling the study of protein–protein interactions using IR spectroscopy. The large shift of the amide I band (approx. 45 cm−1 towards lower frequency) upon 13C labelling of one of the proteins reveals the amide I band of the unlabelled protein, enabling it to be used as a probe for monitoring conformational changes. With site-specific isotopic labelling, structural resolution at the level of individual amino acid residues can be achieved. Furthermore, the ability to record IR spectra of proteins in diverse environments means that isotope-edited IR spectroscopy can be used to structurally characterize difficult systems such as protein–protein complexes bound to membranes or large insoluble peptide/protein aggregates. In the present article, examples of application of isotope-edited IR spectroscopy for studying protein–protein interactions are provided.


2020 ◽  
pp. jbc.RA120.015452
Author(s):  
Eileen T. Burchfiel ◽  
Anniina Vihervaara ◽  
Michael J. Guertin ◽  
Rocio Gomez-Pastor ◽  
Dennis J. Thiele

Heat Shock Transcription Factor 1 (HSF1) orchestrates cellular stress protection by activating or repressing gene transcription in response to protein misfolding, oncogenic cell proliferation and other environmental stresses. HSF1 is tightly regulated via intramolecular repressive interactions, post-translational modifications, and protein-protein interactions. How these HSF1 regulatory protein interactions are altered in response to acute and chronic stress is largely unknown. To elucidate the profile of HSF1 protein interactions under normal growth, chronic and acutely stressful conditions, quantitative proteomics studies identified interacting proteins in the response to heat shock or in the presence of a poly-glutamine aggregation protein cell-based model of Huntington’s Disease. These studies identified distinct protein interaction partners of HSF1 as well as changes in the magnitude of shared interactions as a function of each stressful condition. Several novel HSF1-interacting proteins were identified that encompass a wide variety of cellular functions, including roles in DNA repair, mRNA processing, regulation of RNA polymerase II and others. One HSF1 partner, CTCF, interacted with HSF1 in a stress-inducible manner and functions in repression of specific HSF1 target genes. Understanding how HSF1 regulates gene repression is a crucial question, given the dysregulation of HSF1 target genes in both cancer and neurodegeration. These studies expand our understanding of HSF1-mediated gene repression and provide key insights into HSF1 regulation via protein-protein interactions.


Sign in / Sign up

Export Citation Format

Share Document