scholarly journals A Novel Method for Identifying Essential Proteins Based on Non-negative Matrix Tri-Factorization

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhihong Zhang ◽  
Meiping Jiang ◽  
Dongjie Wu ◽  
Wang Zhang ◽  
Wei Yan ◽  
...  

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.

2021 ◽  
Author(s):  
Zhihong Zhang ◽  
Sai Hu ◽  
Wei Yan ◽  
Bihai Zhao ◽  
Lei Wang

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.


2015 ◽  
Vol 4 (4) ◽  
pp. 35-51 ◽  
Author(s):  
Bandana Barman ◽  
Anirban Mukhopadhyay

Identification of protein interaction network is very important to find the cell signaling pathway for a particular disease. The authors have found the differentially expressed genes between two sample groups of HIV-1. Samples are wild type HIV-1 Vpr and HIV-1 mutant Vpr. They did statistical t-test and found false discovery rate (FDR) to identify the genes increased in expression (up-regulated) or decreased in expression (down-regulated). In the test, the authors have computed q-values of test to identify minimum FDR which occurs. As a result they found 172 differentially expressed genes between their sample wild type HIV-1 Vpr and HIV-1 mutant Vpr, R80A. They found 68 up-regulated genes and 104 down-regulated genes. From the 172 differentially expressed genes the authors found protein-protein interaction network with string-db and then clustered (subnetworks) the PPI networks with cytoscape3.0. Lastly, the authors studied significance of subnetworks with performing gene ontology and also studied the KEGG pathway of those subnetworks.


2014 ◽  
Vol 934 ◽  
pp. 159-164
Author(s):  
Yun Yuan Dong ◽  
Xian Chun Zhang

Protein-protein interaction (PPI) networks provide a simplified overview of the web of interactions that take place inside a cell. According to the centrality-lethality rule, hub proteins (proteins with high degree) tend to be essential in the PPI network. Moreover, there are also many low degree proteins in the PPI network, but they have different lethality. Some of them are essential proteins (essential-nonhub proteins), and the others are not (nonessential-nonhub proteins). In order to explain why nonessential-nonhub proteins don’t have essentiality, we propose a new measure n-iep (the number of essential neighbors) and compare nonessential-nonhub proteins with essential-nonhub proteins from topological, evolutionary and functional view. The comparison results show that there are statistical differences between nonessential-nonhub proteins and essential-nonhub proteins in centrality measures, clustering coefficient, evolutionary rate and the number of essential neighbors. These are reasons why nonessential-nonhub proteins don’t have lethality.


2020 ◽  
Author(s):  
Brennan Klein ◽  
Ludvig Holmér ◽  
Keith M. Smith ◽  
Mackenzie M. Johnson ◽  
Anshuman Swain ◽  
...  

AbstractProtein-protein interaction (PPI) networks represent complex intra-cellular protein interactions, and the presence or absence of such interactions can lead to biological changes in an organism. Recent network-based approaches have shown that a phenotype’s PPI network’s resilience to environmental perturbations is related to its placement in the tree of life; though we still do not know how or why certain intra-cellular factors can bring about this resilience. One such factor is gene expression, which controls the simultaneous presence of proteins for allowed extant interactions and the possibility of novel associations. Here, we explore the influence of gene expression and network properties on a PPI network’s resilience, focusing especially on ribosomal proteins—vital molecular-complexes involved in protein synthesis, which have been extensively and reliably mapped in many species. Using publicly-available data of ribosomal PPIs for E. coli, S.cerevisae, and H. sapiens, we compute changes in network resilience as new nodes (proteins) are added to the networks under three node addition mechanisms—random, degree-based, and gene-expression-based attachments. By calculating the resilience of the resulting networks, we estimate the effectiveness of these node addition mechanisms. We demonstrate that adding nodes with gene-expression-based preferential attachment (as opposed to random or degree-based) preserves and can increase the original resilience of PPI network. This holds in all three species regardless of their distributions of gene expressions or their network community structure. These findings introduce a general notion of prospective resilience, which highlights the key role of network structures in understanding the evolvability of phenotypic traits.1Author SummaryProteins in organismal cells are present at different levels of concentration and interact with other proteins to provide specific functional roles. Accumulating lists of all of these interactions, complex networks of protein interactions become apparent. This allows us to begin asking whether there are network-level mechanisms at play guiding the evolution of biological systems. Here, using this network perspective, we address two important themes in evolutionary biology (i) How are biological systems able to successfully incorporate novelty? (ii) What is the evolutionary role of biological noise in evolutionary novelty? We consider novelty to be the introduction of a new protein, represented as a new “node”, into a network. We simulate incorporation of novel proteins into Protein-Protein Interaction (PPI) networks in different ways and analyse how the resilience of the PPI network alters. We find that novel interactions guided by gene expression (indicative of concentration levels of proteins) creates a more resilient network than either uniformly random interactions or interactions guided solely by the network structure (preferential attachment). Moreover, simulated biological noise in the gene expression increases network resilience. We suggest that biological noise induces novel structure in the PPI network which has the effect of making it more resilient.


2019 ◽  
Author(s):  
David Armanious ◽  
Jessica Schuster ◽  
George F. Tollefson ◽  
Anthony Agudelo ◽  
Andrew T. DeWan ◽  
...  

AbstractBackgroundData analysis has become crucial in the post genomic era where the accumulation of genomic information is mounting exponentially. Analyzing protein-protein interactions in the context of the interactome is a powerful approach to understanding disease phenotypes.ResultsWe describe Proteinarium, a multi-sample protein-protein interaction network analysis and visualization tool. Proteinarium can be used to analyze data for samples with dichotomous phenotypes, multiple samples from a single phenotype or a single sample. Then, by similarity clustering, the network-based relations of samples are identified and clusters of related samples are presented as a dendrogram. Each branch of the dendrogram is built based on network similarities of the samples. The protein-protein interaction networks can be analyzed and visualized on any branch of the dendrogram. Proteinarium’s input can be derived from transcriptome analysis, whole exome sequencing data or any high-throughput screening approach. Its strength lies in use of gene lists for each sample as a distinct input which are further analyzed through protein interaction analyses. Proteinarium output includes the gene lists of visualized networks and PPI interaction files where users can analyze the network(s) on other platforms such as Cytoscape. In addition, since the dendrogram is written in Newick tree format, users can visualize it in other software platforms like Dendroscope, ITOL.ConclusionsProteinarium, through the analysis and visualization of PPI networks, allows researchers to make important observations on high throughput data for a variety of research questions. Proteinarium identifies significant clusters of patients based on their shared network similarity for the disease of interest and the associated genes. Proteinarium is a command-line tool written in Java with no external dependencies and it is freely available at https://github.com/Armanious/Proteinarium.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Qiguo Dai ◽  
Maozu Guo ◽  
Yingjie Guo ◽  
Xiaoyan Liu ◽  
Yang Liu ◽  
...  

Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity.


2019 ◽  
Author(s):  
JE Tomkins ◽  
R Ferrari ◽  
N Vavouraki ◽  
J Hardy ◽  
RC Lovering ◽  
...  

AbstractThe past decade has seen the rise of omics data, for the understanding of biological systems in health and disease. This wealth of data includes protein-protein interaction (PPI) derived from both low and high-throughput assays, which is curated into multiple databases that capture the extent of available information from the peer-reviewed literature. Although these curation efforts are extremely useful, reliably downloading and integrating PPI data from the variety of available repositories is challenging and time consuming.We here present a novel user-friendly web-resource called PINOT (Protein Interaction Network Online Tool; available at http://www.reading.ac.uk/bioinf/PINOT/PINOT_form.html) to optimise the collection and processing of PPI data from the IMEx consortium associated repositories (members and observers) and from WormBase for constructing, respectively, human and C. elegans PPI networks.Users submit a query containing a list of proteins of interest for which PINOT will mine PPIs. PPI data is downloaded, merged, quality checked, and confidence scored based on the number of distinct methods and publications in which each interaction has been reported. Examples of PINOT applications are provided to highlight the performance, the ease of use and the potential applications of this tool.PINOT is a tool that allows users to survey the literature, extracting PPI data for a list of proteins of interest. The comparison with analogous tools showed that PINOT was able to extract similar numbers of PPIs while incorporating a set of innovative features. PINOT processes both small and large queries, it downloads PPIs live through PSICQUIC and it applies quality control filters on the downloaded PPI annotations (i.e. removing the need of manual inspection by the user). PINOT provides the user with information on detection methods and publication history for each of the downloaded interaction data entry and provides results in a table format that can be easily further customised and/or directly uploaded in a network visualization software.


Author(s):  
Gaston K Mazandu ◽  
Christopher Hooper ◽  
Kenneth Opap ◽  
Funmilayo Makinde ◽  
Victoria Nembaware ◽  
...  

Abstract Advances in high-throughput sequencing technologies have resulted in an exponential growth of publicly accessible biological datasets. In the ‘big data’ driven ‘post-genomic’ context, much work is being done to explore human protein–protein interactions (PPIs) for a systems level based analysis to uncover useful signals and gain more insights to advance current knowledge and answer specific biological and health questions. These PPIs are experimentally or computationally predicted, stored in different online databases and some of PPI resources are updated regularly. As with many biological datasets, such regular updates continuously render older PPI datasets potentially outdated. Moreover, while many of these interactions are shared between these online resources, each resource includes its own identified PPIs and none of these databases exhaustively contains all existing human PPI maps. In this context, it is essential to enable the integration of or combining interaction datasets from different resources, to generate a PPI map with increased coverage and confidence. To allow researchers to produce an integrated human PPI datasets in real-time, we introduce the integrated human protein–protein interaction network generator (IHP-PING) tool. IHP-PING is a flexible python package which generates a human PPI network from freely available online resources. This tool extracts and integrates heterogeneous PPI datasets to generate a unified PPI network, which is stored locally for further applications.


2016 ◽  
Vol 113 (18) ◽  
pp. 4976-4981 ◽  
Author(s):  
Arunachalam Vinayagam ◽  
Travis E. Gibson ◽  
Ho-Joon Lee ◽  
Bahar Yilmazel ◽  
Charles Roesel ◽  
...  

The protein–protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as “indispensable,” “neutral,” or “dispensable,” which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network’s control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.


Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 153 ◽  
Author(s):  
Wei Dai ◽  
Qi Chang ◽  
Wei Peng ◽  
Jiancheng Zhong ◽  
Yongjiang Li

Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein–protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes’ sequence information or genes’ centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.


Sign in / Sign up

Export Citation Format

Share Document