On the robustness of graph-based clustering to random network alterations

Biological functions emerge from complex and dynamic networks of protein-protein interactions. Because these protein-protein interaction networks, or interactomes, represent pairwise connections within a hierarchically organized system, it is often useful to identify higher-order associations embedded within them, such as multi-member protein complexes. Graph-based clustering techniques are widely used to accomplish this goal, and dozens of field-specific and general clustering algorithms exist. However, interactomes can be prone to errors, especially when inferred from high-throughput biochemical assays. Therefore, robustness to network-level noise is an important criterion for any clustering algorithm that aims to generate robust, reproducible clusters. Here, we tested the robustness of a range of graph-based clustering algorithms in the presence of noise, including algorithms common across domains and those specific to protein networks. Strikingly, we found that all of the clustering algorithms tested here markedly amplified noise within the underlying protein interaction network. Randomly rewiring only 1% of network edges yielded more than a 50% change in clustering results, indicating that clustering markedly amplified network-level noise. Moreover, we found the impact of network noise on individual clusters was not uniform: some clusters were consistently robust to injected noise while others were not. To assist in assessing this, we developed the clust.perturb R package and Shiny web application to measure the reproducibility of clusters by randomly perturbing the network. We show that clust.perturb results are predictive of real-world cluster stability: poorly reproducible clusters as identified by clust.perturb are significantly less likely to be reclustered across experiments. We conclude that graph-based clustering amplifies noise in protein interaction networks, but quantifying the robustness of a cluster to network noise can separate stable protein complexes from spurious associations.

Download Full-text

On the robustness of graph-based clustering to random network alterations

10.1101/2020.04.24.059758 ◽

2020 ◽

Author(s):

R. Greg Stacey ◽

Michael A. Skinnider ◽

Leonard J. Foster

Keyword(s):

Protein Interactions ◽

Web Application ◽

Clustering Algorithm ◽

Protein Complexes ◽

Random Network ◽

Clustering Algorithms ◽

Dynamic Networks ◽

R Package ◽

The Impact ◽

Graph Based Clustering

ABSTRACTBiological functions emerge from complex and dynamic networks of protein-protein interactions. Because these protein-protein interaction networks, or interactomes, represent pairwise connections within a hierarchically organized system, it is often useful to identify higher-order associations embedded within them, such as multi-member protein-complexes. Graph-based clustering techniques are widely used to accomplish this goal, and dozens of field-specific and general clustering algorithms exist. However, interactomes can be prone to errors, especially interactomes that infer interactions using high-throughput biochemical assays. Therefore, robustness to network-level variability is an important criterion for any clustering algorithm that aims to generate robust, reproducible clusters. Here, we tested the robustness of a range of graph-based clustering algorithms in the presence of network-level noise, including algorithms common across domains and those specific to protein networks. We found that the results of all clustering algorithms measured were profoundly sensitive to injected network noise.Randomly rewiring 1% of network edges yielded up to a 57% change in clustering results, indicating that clustering markedly amplified network-level noise. However, the impact of network noise on individual clusters was not uniform. We found that some clusters were consistently robust to injected network noise while others were not. Therefore, we developed the clust.perturb R package and Shiny web application, which measures the reproducibility of clusters by randomly perturbing the network. We show that clust.perturb results are predictive of real-world cluster stability: poorly reproducible clusters as identified by clust.perturb are significantly less likely to be reclustered across experiments. We conclude that quantifying the robustness of a cluster to network noise, as implemented in clust.perturb, provides a powerful tool for ranking the reproducibility of clusters, and separating stable protein complexes from spurious associations.

Download Full-text