scholarly journals A novel essential protein identification method based on PPI networks and gene expression data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jiancheng Zhong ◽  
Chao Tang ◽  
Wei Peng ◽  
Minzhu Xie ◽  
Yusui Sun ◽  
...  

Abstract Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.

2020 ◽  
Author(s):  
Jiancheng Zhong ◽  
Chao Tang ◽  
Wei Peng ◽  
Minzhu Xie ◽  
Yusui Sun ◽  
...  

Abstract Background: Some proposed methods for identifying essential proteins have better results by usingbiological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.Results: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression.Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.


2020 ◽  
Author(s):  
Jiancheng Zhong ◽  
Chao Tang ◽  
Wei Peng ◽  
Minzhu Xie ◽  
Yusui Sun ◽  
...  

Abstract Background: Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network based on gene expression and the PPI network data. Our experiments show that our method can improve the accuracy in predicting essential proteins. Results: In this paper, we propose a new measure, named JDC, based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We respectively perform experiments on Yeast data and E.coli data and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins from active PPI networks constructed with dynamic gene expression. Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network . (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.


Author(s):  
Olga Lazareva ◽  
Jan Baumbach ◽  
Markus List ◽  
David B Blumenthal

Abstract In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein–protein interaction (PPI) networks. Although active module identification has led to various novel insights into complex diseases, there is increasing awareness in the field that the combination of gene expression data and PPI network is problematic because up-to-date PPI networks have a very small diameter and are subject to both technical and literature bias. In this paper, we report the results of an extensive study where we analyzed for the first time whether widely used AMIMs really benefit from using PPI networks. Our results clearly show that, except for the recently proposed AMIM DOMINO, the tested AMIMs do not produce biologically more meaningful candidate disease modules on widely used PPI networks than on random networks with the same node degrees. AMIMs hence mainly learn from the node degrees and mostly fail to exploit the biological knowledge encoded in the edges of the PPI networks. This has far-reaching consequences for the field of active module identification. In particular, we suggest that novel algorithms are needed which overcome the degree bias of most existing AMIMs and/or work with customized, context-specific networks instead of generic PPI networks.


2019 ◽  
Vol 12 (03) ◽  
pp. 1950024
Author(s):  
Ping Huang ◽  
Peng Ge ◽  
Qing-Fen Tian ◽  
Guo-Bao Huang

Purpose: Burn is one of the most common injuries in clinical practice. The use of transcription factors (TFs) has been reported to reverse the epigenetic rewiring process and has great promise for skin regeneration. To better identify key TFs for skin reprogramming, we proposed a predictive system that conjoint analyzed gene expression data and regulatory network information. Methods: Firstly, the gene expression data in skin tissues were downloaded and the LIMMA package was used to identify differential-expressed genes (DEGs). Then three ways, including identification of TFs from the DEGs, enrichment analysis of TFs by a Fisher’s test, the direct and network-based influence degree analysis of TFs, were used to identify the key TFs related to skin regeneration. Finally, to obtain most comprehensive combination of TFs, the coverage extent of all the TFs were analyzed by Venn diagrams. Results: The top 30 TFs combinations with higher coverage were acquired. Especially, TFAP2A, ZEB1, and NFKB1 exerted greater regulatory influence on other DEGs in the local network and presented relatively higher degrees in the protein–protein interaction (PPI) networks. Conclusion: These TFs identification could give a deeper understanding of the molecular mechanism of cell trans-differentiation, and provide a reference for the skin regeneration and burn treatment.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weimiao Sun ◽  
Lei Wang ◽  
Jiaxin Peng ◽  
Zhen Zhang ◽  
Tingrui Pei ◽  
...  

Background:: Research has shown that essential proteins play important roles in the development and survival of organisms. Because of the high costs of traditional biological experiments, several computational prediction methods based on known protein-protein interactions (PPIs) have been recently proposed to detect essential proteins. Objective:: Here, a novel prediction model called IoMCD is proposed to identify essential proteins by combining known PPIs with a variety of biological information about proteins, including gene expression data and homologous information of proteins. Methods:: Compared to the traditional state-of-the-art prediction models, IoMCD involves two kinds of weights that are obtained, respectively, by extracting topological features of proteins from the original known protein–protein interaction (PPI) networks and calculating the Pearson correlation coefficients (PCCs) between the gene expression data of proteins. Based on these two kinds of weights and adopting a cross-entropy method, a unique weight is assigned to each protein. Subsequently, the homologous information of proteins is used to calculate an initial score for each protein. Finally, based on the unique weights and initial score of proteins, an iterative method is designed to measure the essentialities of proteins. Results:: Intensive experiments were performed, and simulation results showed that the prediction accuracy of IoMCD, based on the dataset downloaded from the DIP and Gavin databases, was 92.16% and 89.71%, respectively, in the top 1% of the predicted essential proteins. Conclusion:: Both simulation results demonstrated that IoMCD can achieve excellent prediction accuracy and could be an effective method for essential protein prediction.


Sign in / Sign up

Export Citation Format

Share Document