scholarly journals Identifying Protein Complexes in Protein-protein Interaction Data using Graph Convolution Network

2021 ◽  
Author(s):  
Nazar Zaki ◽  
Harsh Singh

Protein complexes are groups of two or more polypeptide chains that join together to build noncovalent networks of protein interactions. A number of means of computing the ways in which protein complexes and their members can be identified from these interaction networks have been created. While most of the existing methods identify protein complexes from the protein-protein interaction networks (PPIs) at a fairly decent level, the applicability of advanced graph network methods has not yet been adequately investigated. In this paper, we proposed various graph convolutional networks (GCNs) methods to improve the detection of the protein functional complexes. We first formulated the protein complex detection problem as a node classification problem. Second, the Neural Overlapping Community Detection (NOCD) model was applied to cluster the nodes (proteins) using a complex affiliation matrix. A representation learning approach, which combines the multi-class GCN feature extractor (to obtain the features of the nodes) and the mean shift clustering algorithm (to perform clustering), is also presented. We have also improved the efficiency of the multi-class GCN network to reduce space and time complexities by converting the dense-dense matrix operations into dense-spares or sparse-sparse matrix operations. This proposed solution significantly improves the scalability of the existing GCN network. Finally, we apply clustering aggregation to find the best protein complexes. A grid search was performed on various detected complexes obtained by applying three well-known protein detection methods namely ClusterONE, CMC, and PEWCC with the help of the Meta-Clustering Algorithm (MCLA) and Hybrid Bipartite Graph Formulation (HBGF) algorithm. The proposed GCN-based methods were tested on various publicly available datasets and provided significantly better performance than the previous state-of-the-art methods. The code and data used in this study are available from https://github.com/Analystharsh/GCN_complex_detection.

Author(s):  
Hugo Willy

Recent breakthroughs in high throughput experiments to determine protein-protein interaction have generated a vast amount of protein interaction data. However, most of the experiments could only answer the question of whether two proteins interact but not the question on the mechanisms by which proteins interact. Such understanding is crucial for understanding the protein interaction of an organism as a whole (the interactome) and even predicting novel protein interactions. Protein interaction usually occurs at some specific sites on the proteins and, given their importance, they are usually well conserved throughout the evolution of the proteins of the same family. Based on this observation, a number of works on finding protein patterns/motifs conserved in interacting proteins have emerged in the last few years. Such motifs are collectively termed as the interaction motifs. This chapter provides a review on the different approaches on finding interaction motifs with a discussion on their implications, potentials and possible areas of improvements in the future.


Sign in / Sign up

Export Citation Format

Share Document