Identification of Essential Proteins by Using Complexes and Biological Information on Dynamic PPI Network

Author(s):  
Wei Liu ◽  
Liangyu Ma ◽  
Ling Chen
2021 ◽  
Author(s):  
Zhihong Zhang ◽  
Sai Hu ◽  
Wei Yan ◽  
Bihai Zhao ◽  
Lei Wang

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.


Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 177 ◽  
Author(s):  
Xiujuan Lei ◽  
Siguo Wang ◽  
Fang-Xiang Wu

Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein–protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhihong Zhang ◽  
Meiping Jiang ◽  
Dongjie Wu ◽  
Wang Zhang ◽  
Wei Yan ◽  
...  

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.


2014 ◽  
Vol 644-650 ◽  
pp. 5202-5206
Author(s):  
Yan Li Zha ◽  
Wan Cheng Luo

Importance of proteins are different to perform functions of cells in living organisms according to the relevant experiment results, and more essential proteins is the most important kind of proteins. There are recently many computational approaches proposed to predict essential proteins in network level through network topologies combined with biological information of proteins. However it is still hard to identify them because of limitations of topological centralities and bioinformatic sources. And more it is the challenge is to perform better with less resources. Therefore in this paper, we first examine the correlation between common topological centralities and essential proteins and choose a few particular centralities, and then to build a SVM model, names as TC-SVM, for predicting the essential proteins. The new method has been applied to a yeast protein interaction networks, which are obtained from the BioGRID database. The ten folds experimental results show that the performance of predicting essential proteins by TC-SVM is excellent.


Author(s):  
Paolo Marcatili ◽  
Anna Tramontano

This chapter provides an overview of the current computational methods for PPI network cleansing. The authors first present the issue of identifying reliable PPIs from noisy and incomplete experimental data. Next, they address the questions of which are the expected results of the different experimental studies, of what can be defined as true interactions, of which kind of data are to be integrated in assigning reliability levels to PPIs and which gold standard should the authors use in training and testing PPI filtering methods. Finally, Marcatili and Tramontano describe the state of the art in the field, presenting the different classes of algorithms and comparing their results. The aim of the chapter is to guide the reader in the choice of the most convenient methods, experiments and integrative data and to underline the most common biases and errors to obtain a portrait of PINs which is not only reliable but as well able to correctly retrieve the biological information contained in such data.


2014 ◽  
Vol 22 (03) ◽  
pp. 339-351 ◽  
Author(s):  
JIAWEI LUO ◽  
NAN ZHANG

Essential proteins are important for the survival and development of organisms. Lots of centrality algorithms based on network topology have been proposed to detect essential proteins and achieve good results. However, most of them only focus on the network topology, but ignore the false positive (FP) interactions in protein–protein interaction (PPI) network. In this paper, gene ontology (GO) information is proposed to measure the reliability of the edges in PPI network and we propose a novel algorithm for identifying essential proteins, named EGC algorithm. EGC algorithm integrates topology character of PPI network and GO information. To validate the performance of EGC algorithm, we use EGC and other nine methods (DC, BC, CC, SC, EC, LAC, NC, PEC and CoEWC) to identify the essential proteins in the two different yeast PPI networks: DIP and MIPS. The results show that EGC is better than the other nine methods, which means adding GO information can help in predicting essential proteins.


2021 ◽  
Vol 16 ◽  
Author(s):  
Chuanyan Wu ◽  
Bentao Lin ◽  
Kai Shi ◽  
Qingju Zhang ◽  
Rui Gao ◽  
...  

Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based were extracted. Based on the protein sequence, graph theory-based features, information-based features, composition, and physiochemical features, etc., were extracted. Finally, 282 features were constructed. In order to select the features that contributed most to the identification, the ReliefF-based feature selection method was adopted to measure the weights of these features. As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF obtained an AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.


2010 ◽  
Vol 7 (3) ◽  
pp. 275-289 ◽  
Author(s):  
Vesna Memišević ◽  
Tijana Milenković ◽  
Nataša Pržulj

Summary Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.


2020 ◽  
Author(s):  
Jiancheng Zhong ◽  
Chao Tang ◽  
Wei Peng ◽  
Minzhu Xie ◽  
Yusui Sun ◽  
...  

Abstract Background: Some proposed methods for identifying essential proteins have better results by usingbiological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.Results: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression.Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.


2020 ◽  
Author(s):  
Shiyuan Li ◽  
Zhen Zhang ◽  
Xueyong Li ◽  
Yihong Tan ◽  
Lei Wang ◽  
...  

Abstract Background: Essential proteins have great impacts on cell survival and development, and played important roles in disease analysis and new drug design. However, since it is inefficient and costly to identify essential proteins by using biological experiments, then there is an urgent need for automated and accurate detection methods. In recent years, the recognition of essential proteins in protein interaction networks (PPI) has become a research hotspot, and many computational models for predicting essential proteins have been proposed successively.Results: In order to achieve higher prediction performance, in this paper, a new prediction model called TGSO is proposed. In TGSO, a protein aggregation degree network is constructed first by adopting the node density measurement method for complex networks. And simultaneously, a protein co-expression interactive network is constructed by combining the gene expression information with the network connectivity, and a protein co-localization interaction network is constructed based on the subcellular localization data. And then, through integrating these three kinds of newly constructed networks, a comprehensive protein-protein interaction network will be obtained. Finally, based on the homology information, scores can be calculated out iteratively for different proteins, which can be utilized to estimate the importance of proteins effectively. Moreover, in order to evaluate the identification performance of TGSO, we have compared TGSO with 13 different latest competitive methods based on three kinds of yeast databases. And experimental results show that TGSO can achieve identification accuracies of 94\%, 82\% and 72\% out of the top 1\%, 5\% and 10\% candidate proteins respectively, which are to some degree superior to these state-of-the-art competitive models. Conclusion: We constructed a comprehensive interactive network based on multi-source data to reduce the noise and errors in the initial PPI, and combined with iterative methods to improve the accuracy of necessary protein prediction, and means that TGSO may be conducive to the future development of essential protein recognition as well.


Sign in / Sign up

Export Citation Format

Share Document