Impact of low-confidence interactions on computational identification of protein complexes
Protein complexes are the cornerstones of most of the biological processes. Identifying protein complexes is crucial in understanding the principles of cellular organization with several important applications, including in disease diagnosis. Several computational techniques have been developed to identify protein complexes from protein–protein interaction (PPI) data (equivalently, from PPI networks). These PPI data have a significant amount of false positives, which is a bottleneck in identifying protein complexes correctly. Gene ontology (GO)-based semantic similarity measures can be used to assign a confidence score to PPIs. Consequently, low-confidence PPIs are highly likely to be false positives. In this paper, we systematically study the impact of low-confidence PPIs on the performance of complex detection methods using GO-based semantic similarity measures. We consider five state-of-the-art complex detection algorithms and nine GO-based similarity measures in the evaluation. We find that each complex detection algorithm significantly improves its performance after the filtration of low-similarity scored PPIs. It is also observed that the percentage improvement and the filtration percentage (of low-confidence PPIs) are highly correlated.