scholarly journals Identification of Essential Proteins Based on Improved HITS Algorithm

Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 177 ◽  
Author(s):  
Xiujuan Lei ◽  
Siguo Wang ◽  
Fang-Xiang Wu

Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein–protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.

2021 ◽  
Author(s):  
Zhihong Zhang ◽  
Sai Hu ◽  
Wei Yan ◽  
Bihai Zhao ◽  
Lei Wang

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.


2014 ◽  
Vol 22 (03) ◽  
pp. 339-351 ◽  
Author(s):  
JIAWEI LUO ◽  
NAN ZHANG

Essential proteins are important for the survival and development of organisms. Lots of centrality algorithms based on network topology have been proposed to detect essential proteins and achieve good results. However, most of them only focus on the network topology, but ignore the false positive (FP) interactions in protein–protein interaction (PPI) network. In this paper, gene ontology (GO) information is proposed to measure the reliability of the edges in PPI network and we propose a novel algorithm for identifying essential proteins, named EGC algorithm. EGC algorithm integrates topology character of PPI network and GO information. To validate the performance of EGC algorithm, we use EGC and other nine methods (DC, BC, CC, SC, EC, LAC, NC, PEC and CoEWC) to identify the essential proteins in the two different yeast PPI networks: DIP and MIPS. The results show that EGC is better than the other nine methods, which means adding GO information can help in predicting essential proteins.


2020 ◽  
Author(s):  
Tang Zhang ◽  
Yao-Zong Guan ◽  
Hao Liu

Abstract Background: The study aimed to detect the shared differentially expressed genes (DEGs) and specific DEGs of arrhythmogenic right ventricular cardiomyopathy (ARVC) and dilated cardiomyopathy (DCM) as well as their pathways.Methods: The GSE29819 dataset was examined for the DEGs of ARVC vs. non-failing transplant donor hearts (NF), DCM vs. NF, and ARVC vs. DCM based on 6 patients with ARVC, 7 patients with DCM, and 6 non-failing transplant donor hearts that were never actually transplanted. The shared DEGs and specific DEGs were screened out using a Venn diagram. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, Gene Ontology (GO) annotation, and protein-protein interaction (PPI) of the DEGs were determined using online analytical tools. Then, the modules and hub genes were identified using Cytoscape software.Results: A total of 684 shared DEGs of ARVC vs. NF and DCM vs. NF, 1371 specific DEGs of ARVC vs. NF, and 1075 specific DEGs of DCM vs. NF were identified. The shared DEGs were enriched in 63 biological processes (BP), 11 molecular functions (MF), 10 cellular components (CC), and 25 KEGG pathways. The DEGs of ARVC vs. DCM were enriched in 71 BPs, 19 MFs, 14 CCs, and 26 KEGG pathways. A PPI network with 187 nodes, 700 edges, and 2 modules, and another PPI network with 575 nodes, 2834 edges, and 7 modules were constructed based on the shared and specific DEGs, respectively. The top ten hub genes CCR3, CCR5, CXCL2, CXCL10, CXCR4, FPR1, APLNR, PENK, BDKRB2, GRM8, and RPS8, RPS3A, RPS12, RPS14, RPS21, RPL14, RPL18A, RPL21, RPL31 were identified for the shared and specific PPI networks, respectively.Conclusions: Our findings may help further the understanding of both shared and specific potential molecular mechanisms of ARVC and DCM.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhihong Zhang ◽  
Meiping Jiang ◽  
Dongjie Wu ◽  
Wang Zhang ◽  
Wei Yan ◽  
...  

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.


2020 ◽  
Author(s):  
Tang Zhang ◽  
Yao-Zong Guan ◽  
Hao Liu

Abstract Background:The study aimed to detect the shared differentially expressed genes (DEGs) and specific DEGs of arrhythmogenic right ventricular cardiomyopathy (ARVC) and dilated cardiomyopathy (DCM) as well as their pathways. Methods: The GSE29819 dataset was examined for the DEGs of ARVC vs. non-failing transplant donor hearts (NF), DCM vs. NF, and ARVC vs. DCM based on 8 patients with ARVC, 7 patients with DCM, and 4 non-failing transplant donor hearts that were never actually transplanted. The shared DEGs and specific DEGs were screened out using a Venn diagram. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, Gene Ontology (GO) annotation, and protein-protein interaction (PPI) of the DEGs were determined using online analytical tools. Then, the modules and hub genes were identified using Cytoscape software. Results: A total of 684 shared DEGs of ARVC vs. NF and DCM vs. NF, 1371 specific DEGs of ARVC vs. NF, and 1075 specific DEGs of DCM vs. NF were identified. The shared DEGs were enriched in 63 biological processes (BP), 11 molecular functions (MF), 10 cellular components (CC), and 25 KEGG pathways. The DEGs of ARVC vs. DCM were enriched in 71 BPs, 19 MFs, 14 CCs, and 26 KEGG pathways. A PPI network with 187 nodes, 700 edges, and 2 modules, and another PPI network with 575 nodes, 2834 edges, and 7 modules were constructed based on the shared and specific DEGs, respectively. The top ten hub genes CCR3, CCR5, CXCL2, CXCL10, CXCR4, FPR1, APLNR, PENK, BDKRB2, GRM8, and RPS8, PRS3A, PRS12, RPS14, RPS21, RPL14, RPL18A, RPL21, RPL31 were identified for the shared and specific PPI networks, respectively. Conclusions: Our findings may help further the understanding of both shared and specific potential molecular mechanisms of ARVC and DCM.


2020 ◽  
Vol 34 (10) ◽  
pp. 2050090
Author(s):  
Pengli Lu ◽  
JingJuan Yu

Essential protein plays a crucial role in the process of cell life. The identification of essential proteins not only promotes the development of drug target technology, but also contributes to the mechanism of biological evolution. There are plenty of scholars who pay attention to discover essential proteins according to the topological structure of protein network and biological information. The accuracy of protein recognition still demands to be improved. In this paper, we propose a method which integrates the clustering coefficient in protein complexes and topological properties to determine the essentiality of proteins. First, we give the definition of In-clustering coefficient (IC) to describe the properties of protein complexes. Then we propose a new method, complex edge and node clustering (CENC) coefficient, to identify essential proteins. Different Protein–Protein Interaction (PPI) networks of Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Through some experiments of logistic regression model, the results show that the method of CENC can promote the ability of recognizing essential proteins by comparing with the existing methods DC, BC, EC, SC, LAC, NC and the recent UC method.


2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Jie Zhao ◽  
Xiujuan Lei

Abstract Background Protein complexes are the cornerstones of many biological processes and gather them to form various types of molecular machinery that perform a vast array of biological functions. In fact, a protein may belong to multiple protein complexes. Most existing protein complex detection algorithms cannot reflect overlapping protein complexes. To solve this problem, a novel overlapping protein complexes identification algorithm is proposed. Results In this paper, a new clustering algorithm based on overlay network chain in quotient space, marked as ONCQS, was proposed to detect overlapping protein complexes in weighted PPI networks. In the quotient space, a multilevel overlay network is constructed by using the maximal complete subgraph to mine overlapping protein complexes. The GO annotation data is used to weight the PPI network. According to the compatibility relation, the overlay network chain in quotient space was calculated. The protein complexes are contained in the last level of the overlay network. The experiments were carried out on four PPI databases, and compared ONCQS with five other state-of-the-art methods in the identification of protein complexes. Conclusions We have applied ONCQS to four PPI databases DIP, Gavin, Krogan and MIPS, the results show that it is superior to other five existing algorithms MCODE, MCL, CORE, ClusterONE and COACH in detecting overlapping protein complexes.


2014 ◽  
Vol 934 ◽  
pp. 159-164
Author(s):  
Yun Yuan Dong ◽  
Xian Chun Zhang

Protein-protein interaction (PPI) networks provide a simplified overview of the web of interactions that take place inside a cell. According to the centrality-lethality rule, hub proteins (proteins with high degree) tend to be essential in the PPI network. Moreover, there are also many low degree proteins in the PPI network, but they have different lethality. Some of them are essential proteins (essential-nonhub proteins), and the others are not (nonessential-nonhub proteins). In order to explain why nonessential-nonhub proteins don’t have essentiality, we propose a new measure n-iep (the number of essential neighbors) and compare nonessential-nonhub proteins with essential-nonhub proteins from topological, evolutionary and functional view. The comparison results show that there are statistical differences between nonessential-nonhub proteins and essential-nonhub proteins in centrality measures, clustering coefficient, evolutionary rate and the number of essential neighbors. These are reasons why nonessential-nonhub proteins don’t have lethality.


2020 ◽  
Author(s):  
Jiancheng Zhong ◽  
Chao Tang ◽  
Wei Peng ◽  
Minzhu Xie ◽  
Yusui Sun ◽  
...  

Abstract Background: Some proposed methods for identifying essential proteins have better results by usingbiological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.Results: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression.Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.


Sign in / Sign up

Export Citation Format

Share Document