Using a Genetic Algorithm and Markov Clustering on Protein–Protein Interaction Graphs

2013 ◽  
pp. 805-816
Author(s):  
Charalampos Moschopoulos ◽  
Grigorios Beligiannis ◽  
Spiridon Likothanassis ◽  
Sophia Kossida

In this paper, a Genetic Algorithm is applied on the filter of the Enhanced Markov Clustering algorithm to optimize the selection of clusters having a high probability to represent protein complexes. The filter was applied on the results (obtained by experiments made on five different yeast datasets) of three different algorithms known for their efficiency on protein complex detection through protein interaction graphs. The results are compared with three popular clustering algorithms, proving the efficiency of the proposed method according to metrics such as successful prediction rate and geometrical accuracy.

Author(s):  
Charalampos Moschopoulos ◽  
Grigorios Beligiannis ◽  
Spiridon Likothanassis ◽  
Sophia Kossida

In this paper, a Genetic Algorithm is applied on the filter of the Enhanced Markov Clustering algorithm to optimize the selection of clusters having a high probability to represent protein complexes. The filter was applied on the results (obtained by experiments made on five different yeast datasets) of three different algorithms known for their efficiency on protein complex detection through protein interaction graphs. The results are compared with three popular clustering algorithms, proving the efficiency of the proposed method according to metrics such as successful prediction rate and geometrical accuracy.


2020 ◽  
Vol 18 (03) ◽  
pp. 2040010 ◽  
Author(s):  
Heng Yao ◽  
Jihong Guan ◽  
Tianying Liu

Identifying protein complexes is an important issue in computational biology, as it benefits the understanding of cellular functions and the design of drugs. In the past decades, many computational methods have been proposed by mining dense subgraphs in Protein–Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents accurately detecting complexes directly from the raw PINs. In this paper, we propose a denoising approach for protein complex detection by using variational graph auto-encoder. First, we embed a PIN to vector space by a stacked graph convolutional network (GCN), then decide which interactions in the PIN are credible. If the probability of an interaction being credible is less than a threshold, we delete the interaction. In such a way, we reconstruct a reliable PIN. Following that, we detect protein complexes in the reconstructed PIN by using several typical detection methods, including CPM, Coach, DPClus, GraphEntropy, IPCA and MCODE, and compare the results with those obtained directly from the original PIN. We conduct the empirical evaluation on four yeast PPI datasets (Gavin, Krogan, DIP and Wiphi) and two human PPI datasets (Reactome and Reactomekb), against two yeast complex benchmarks (CYC2008 and MIPS) and three human complex benchmarks (REACT, REACT_uniprotkb and CORE_COMPLEX_human), respectively. Experimental results show that with the reconstructed PINs obtained by our denoising approach, complex detection performance can get obviously boosted, in most cases by over 5%, sometimes even by 200%. Furthermore, we compare our approach with two existing denoising methods (RWS and RedNemo) while varying different matching rates on separate complex distributions. Our results show that in most cases (over 2/3), the proposed approach outperforms the existing methods.


2015 ◽  
Vol 63 (3) ◽  
pp. 181-189 ◽  
Author(s):  
Konstantinos Theofilatos ◽  
Niki Pavlopoulou ◽  
Christoforos Papasavvas ◽  
Spiros Likothanassis ◽  
Christos Dimitrakopoulos ◽  
...  

2009 ◽  
Vol 07 (01) ◽  
pp. 217-242 ◽  
Author(s):  
LIN GAO ◽  
PENG-GANG SUN ◽  
JIA SONG

Protein–Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. When studying the workings of a biological cell, it is useful to be able to detect known and predict still undiscovered protein complexes within the cell's PPI networks. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitate a fast, accurate approach to biological complex identification. Because of its importance in the studies of protein interaction network, there are different models and algorithms in identifying functional modules in PPI networks. In this paper, we review some representative algorithms, focusing on the algorithms underlying the approaches and how the algorithms relate to each other. In particular, a comparison is given based on the property of the algorithms. Since the PPI network is noisy and still incomplete, some methods which consider other additional properties for preprocessing and purifying of PPI data are presented. We also give a discussion about the functional annotation and validation of protein complexes. Finally, new progress and future research directions are discussed from the computational viewpoint.


2021 ◽  
Author(s):  
Nazar Zaki ◽  
Harsh Singh

Protein complexes are groups of two or more polypeptide chains that join together to build noncovalent networks of protein interactions. A number of means of computing the ways in which protein complexes and their members can be identified from these interaction networks have been created. While most of the existing methods identify protein complexes from the protein-protein interaction networks (PPIs) at a fairly decent level, the applicability of advanced graph network methods has not yet been adequately investigated. In this paper, we proposed various graph convolutional networks (GCNs) methods to improve the detection of the protein functional complexes. We first formulated the protein complex detection problem as a node classification problem. Second, the Neural Overlapping Community Detection (NOCD) model was applied to cluster the nodes (proteins) using a complex affiliation matrix. A representation learning approach, which combines the multi-class GCN feature extractor (to obtain the features of the nodes) and the mean shift clustering algorithm (to perform clustering), is also presented. We have also improved the efficiency of the multi-class GCN network to reduce space and time complexities by converting the dense-dense matrix operations into dense-spares or sparse-sparse matrix operations. This proposed solution significantly improves the scalability of the existing GCN network. Finally, we apply clustering aggregation to find the best protein complexes. A grid search was performed on various detected complexes obtained by applying three well-known protein detection methods namely ClusterONE, CMC, and PEWCC with the help of the Meta-Clustering Algorithm (MCLA) and Hybrid Bipartite Graph Formulation (HBGF) algorithm. The proposed GCN-based methods were tested on various publicly available datasets and provided significantly better performance than the previous state-of-the-art methods. The code and data used in this study are available from https://github.com/Analystharsh/GCN_complex_detection.


2020 ◽  
pp. mcp.RA120.002275
Author(s):  
R. Greg Stacey ◽  
Michael A. Skinnider ◽  
Leonard J. Foster

Biological functions emerge from complex and dynamic networks of protein-protein interactions. Because these protein-protein interaction networks, or interactomes, represent pairwise connections within a hierarchically organized system, it is often useful to identify higher-order associations embedded within them, such as multi-member protein complexes. Graph-based clustering techniques are widely used to accomplish this goal, and dozens of field-specific and general clustering algorithms exist. However, interactomes can be prone to errors, especially when inferred from high-throughput biochemical assays. Therefore, robustness to network-level noise is an important criterion for any clustering algorithm that aims to generate robust, reproducible clusters. Here, we tested the robustness of a range of graph-based clustering algorithms in the presence of noise, including algorithms common across domains and those specific to protein networks. Strikingly, we found that all of the clustering algorithms tested here markedly amplified noise within the underlying protein interaction network. Randomly rewiring only 1% of network edges yielded more than a 50% change in clustering results, indicating that clustering markedly amplified network-level noise. Moreover, we found the impact of network noise on individual clusters was not uniform: some clusters were consistently robust to injected noise while others were not. To assist in assessing this, we developed the clust.perturb R package and Shiny web application to measure the reproducibility of clusters by randomly perturbing the network. We show that clust.perturb results are predictive of real-world cluster stability: poorly reproducible clusters as identified by clust.perturb are significantly less likely to be reclustered across experiments. We conclude that graph-based clustering amplifies noise in protein interaction networks, but quantifying the robustness of a cluster to network noise can separate stable protein complexes from spurious associations.


Sign in / Sign up

Export Citation Format

Share Document