A mixed clustering coefficient centrality for identifying essential proteins

Essential protein plays a crucial role in the process of cell life. The identification of essential proteins not only promotes the development of drug target technology, but also contributes to the mechanism of biological evolution. There are plenty of scholars who pay attention to discover essential proteins according to the topological structure of protein network and biological information. The accuracy of protein recognition still demands to be improved. In this paper, we propose a method which integrates the clustering coefficient in protein complexes and topological properties to determine the essentiality of proteins. First, we give the definition of In-clustering coefficient (IC) to describe the properties of protein complexes. Then we propose a new method, complex edge and node clustering (CENC) coefficient, to identify essential proteins. Different Protein–Protein Interaction (PPI) networks of Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Through some experiments of logistic regression model, the results show that the method of CENC can promote the ability of recognizing essential proteins by comparing with the existing methods DC, BC, EC, SC, LAC, NC and the recent UC method.

Download Full-text

A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

10.21203/rs.3.rs-537545/v1 ◽

2021 ◽

Author(s):

Zhihong Zhang ◽

Sai Hu ◽

Wei Yan ◽

Bihai Zhao ◽

Lei Wang

Keyword(s):

Protein Interaction ◽

Matrix Factorization ◽

Biological Data ◽

Protein Domain ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Non Negative Matrix Factorization

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Download Full-text

PREDICTION OF ESSENTIAL PROTEINS BASED ON EDGE CLUSTERING COEFFICIENT AND GENE ONTOLOGY INFORMATION

Journal of Biological System ◽

10.1142/s0218339014500119 ◽

2014 ◽

Vol 22 (03) ◽

pp. 339-351 ◽

Cited By ~ 3

Author(s):

JIAWEI LUO ◽

NAN ZHANG

Keyword(s):

Gene Ontology ◽

Network Topology ◽

Clustering Coefficient ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Survival And Development ◽

Gene Ontology Information ◽

Better Than

Essential proteins are important for the survival and development of organisms. Lots of centrality algorithms based on network topology have been proposed to detect essential proteins and achieve good results. However, most of them only focus on the network topology, but ignore the false positive (FP) interactions in protein–protein interaction (PPI) network. In this paper, gene ontology (GO) information is proposed to measure the reliability of the edges in PPI network and we propose a novel algorithm for identifying essential proteins, named EGC algorithm. EGC algorithm integrates topology character of PPI network and GO information. To validate the performance of EGC algorithm, we use EGC and other nine methods (DC, BC, CC, SC, EC, LAC, NC, PEC and CoEWC) to identify the essential proteins in the two different yeast PPI networks: DIP and MIPS. The results show that EGC is better than the other nine methods, which means adding GO information can help in predicting essential proteins.

Download Full-text

Identification of Essential Proteins Based on Improved HITS Algorithm

Genes ◽

10.3390/genes10020177 ◽

2019 ◽

Vol 10 (2) ◽

pp. 177 ◽

Cited By ~ 2

Author(s):

Xiujuan Lei ◽

Siguo Wang ◽

Fang-Xiang Wu

Keyword(s):

Molecular Mechanisms ◽

Clustering Coefficient ◽

New Drugs ◽

Biological Information ◽

Ppi Network ◽

Go Annotation ◽

Detection Techniques ◽

Essential Proteins ◽

Ppi Networks ◽

Topological Characteristics

Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein–protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.

Download Full-text

Identifying Essential Proteins in Dynamic PPI Network with Improved FOA

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.3.3285 ◽

2018 ◽

Vol 13 (3) ◽

pp. 365-382 ◽

Cited By ~ 1

Author(s):

Xiujuan Lei ◽

Siguo Wang ◽

Linqiang Pan

Keyword(s):

Fruit Fly ◽

Biological Information ◽

Detection Methods ◽

Fruit Fly Optimization Algorithm ◽

Fruit Fly Optimization ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Cellular Life ◽

Ppi Networks ◽

Ranking Score

Identification of essential proteins plays an important role for understanding the cellular life activity and development in postgenomic era. Identification of essential proteins from the protein-protein interaction (PPI) networks has become a hot topic in recent years. In this work, fruit fly optimization algorithm (FOA) is extended for identifying essential proteins, the extended algorithm is called EPFOA, which merges FOA with topological properties and biological information for essential proteins identification. The algorithm EPFOA has the advantage of identifying multiple essential proteins simultaneously rather than completely relying on ranking score identification individually. The performance of EPFOA is analyzed on dynamic PPI networks, which are constructed by combining the gene expression data. The experimental results demonstrate that EPFOA is more efficient in detecting essential proteins than the state-of-the-art essential proteins detection methods.

Download Full-text

A Novel Method for Identifying Essential Proteins Based on Non-negative Matrix Tri-Factorization

Frontiers in Genetics ◽

10.3389/fgene.2021.709660 ◽

2021 ◽

Vol 12 ◽

Author(s):

Zhihong Zhang ◽

Meiping Jiang ◽

Dongjie Wu ◽

Wang Zhang ◽

Wei Yan ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Negative Impact ◽

False Negative ◽

Interaction Network ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.

Download Full-text

Spectral clustering for detecting protein complexes in protein–protein interaction (PPI) networks

Mathematical and Computer Modelling ◽

10.1016/j.mcm.2010.06.015 ◽

2010 ◽

Vol 52 (11-12) ◽

pp. 2066-2074 ◽

Cited By ~ 25

Author(s):

Guimin Qin ◽

Lin Gao

Keyword(s):

Protein Interaction ◽

Spectral Clustering ◽

Protein Complexes ◽

Protein Protein Interaction ◽

Ppi Networks

Download Full-text

Functional geometry of protein interactomes

Bioinformatics ◽

10.1093/bioinformatics/btz146 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3727-3734 ◽

Cited By ~ 2

Author(s):

Noël Malod-Dognin ◽

Nataša Pržulj

Keyword(s):

Functional Organization ◽

Protein Complexes ◽

Simplicial Complexes ◽

Higher Order ◽

Biological Information ◽

Supplementary Information ◽

Data Types ◽

Ppi Networks ◽

Functional Geometry ◽

Better Than

Abstract Motivation Protein–protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. Results To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker’s yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes. Availability and implementation Codes and datasets are freely available at http://www0.cs.ucl.ac.uk/staff/natasa/Simplets/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Nonessential-Nonhub Proteins in the Protein-Protein Interaction Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.934.159 ◽

2014 ◽

Vol 934 ◽

pp. 159-164

Author(s):

Yun Yuan Dong ◽

Xian Chun Zhang

Keyword(s):

Protein Interaction ◽

Interaction Network ◽

Clustering Coefficient ◽

Centrality Measures ◽

Ppi Network ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Comparison Results ◽

A Cell ◽

High Degree

Protein-protein interaction (PPI) networks provide a simplified overview of the web of interactions that take place inside a cell. According to the centrality-lethality rule, hub proteins (proteins with high degree) tend to be essential in the PPI network. Moreover, there are also many low degree proteins in the PPI network, but they have different lethality. Some of them are essential proteins (essential-nonhub proteins), and the others are not (nonessential-nonhub proteins). In order to explain why nonessential-nonhub proteins don’t have essentiality, we propose a new measure n-iep (the number of essential neighbors) and compare nonessential-nonhub proteins with essential-nonhub proteins from topological, evolutionary and functional view. The comparison results show that there are statistical differences between nonessential-nonhub proteins and essential-nonhub proteins in centrality measures, clustering coefficient, evolutionary rate and the number of essential neighbors. These are reasons why nonessential-nonhub proteins don’t have lethality.

Download Full-text

A METHOD BASED ON LOCAL DENSITY AND RANDOM WALKS FOR COMPLEXES DETECTION IN PROTEIN INTERACTION NETWORKS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010005191 ◽

2010 ◽

Vol 08 (supp01) ◽

pp. 47-62 ◽

Cited By ~ 6

Author(s):

LIANG YU ◽

LIN GAO ◽

KUI LI

Keyword(s):

Random Walks ◽

Protein Interaction ◽

Local Density ◽

Protein Complexes ◽

Biological Significance ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Comprehensive Comparison ◽

Attachment Proteins ◽

Two Stages

In this paper, we present a method based on local density and random walks (LDRW) for core-attachment complexes detection in protein-protein interaction (PPI) networks whether they are weighted or not. Our LDRW method consists of two stages. Firstly, it finds all the protein-complex cores based on local density of subnetwork. Then it uses random walks with restarts for finding the attachment proteins of each detected core to form complexes. We evaluate the effectiveness of our method using two different yeast PPI networks and validate the biological significance of the predicted protein complexes using known complexes in the Munich Information Center for Protein Sequence (MIPS) and Gene Ontology (GO) databases. We also perform a comprehensive comparison between our method and other existing methods. The results show that our method can find more protein complexes with high biological significance and obtains a significant improvement. Furthermore, our method is able to identify biologically significant overlapped protein complexes.

Download Full-text

Heuristic Modularity for Complex Identification in Protein-Protein Interaction Networks

Iraqi Journal of Science ◽

10.24996/ijs.2019.60.8.22 ◽

2019 ◽

pp. 1846-1859

Author(s):

Amenah H. H. Abdulateef ◽

Bara'a A. Attea ◽

Ahmed N. Rashid

Keyword(s):

Protein Interaction ◽

Protein Complexes ◽

Building Blocks ◽

Detection Accuracy ◽

Protein Protein Interaction ◽

Protein Levels ◽

Cellular Processes ◽

Ppi Networks ◽

Protein Protein Interaction Networks ◽

Different Levels

Due to the significant role in understanding cellular processes, the decomposition of Protein-Protein Interaction (PPI) networks into essential building blocks, or complexes, has received much attention for functional bioinformatics research in recent years. One of the well-known bi-clustering descriptors for identifying communities and complexes in complex networks, such as PPI networks, is modularity function. The contribution of this paper is to introduce heuristic optimization models that can collaborate with the modularity function to improve its detection ability. The definitions of the formulated heuristics are based on nodes and different levels of their neighbor properties. The modularity function and the formulated heuristics are then injected into the mechanism of a single objective Evolutionary Algorithm (EA) tailored specifically to tackle the problem, and thus, to identify possible complexes from PPI networks. In the experiments, different overlapping scores are used to evaluate the detection accuracy in both complex and protein levels. According to the evaluation metrics, the results reveal that the introduced heuristics have the ability to harness the accuracy of the existing modularity while identifying protein complexes in the tested PPI networks.

Download Full-text