Better Network Modeling For Link Prediction In Protein-Protein Interaction Networks
Abstract Background: Protein-protein interaction (PPI) data is an important type of data used in functional genomics. However, inaccuracies in high-throughput experiments often result in incomplete PPI data. Computational techniques are thus used to infer missing data and to evaluate confidence scores, with link prediction being one such approach that uses the structure of the network of PPIs known so far to find good candidates for missing PPIs. Recently, a new idea called the L3 principle introduced biological motivation into PPI link predictions, yielding predictors that are superior to general-purpose link predictors for complex networks. However, the previously developed L3 principle-based link predictors are only an approximate implementation of the L3 principle. As such, not only is the full potential of the L3 principle not realized, they may even lead to candidate PPIs that otherwise fit the L3 principle being penalized. Result: In this article, we propose a formulation of link predictors without approximation that we call ExactL3 (L3E) by addressing missing elements within L3 predictors in the perspective of network modeling. Through statistical and biological metrics, we show that in general, L3E predictors perform better than the previously proposed methods on seven datasets across two organisms (human and yeast) using a reasonable amount of computation time. In addition to L3E being able to rank the PPIs more accurately, we also found that L3-based predictors, including L3E, predicted a different pool of real PPIs than the general-purpose link predictors. This suggests that different types of PPIs can be predicted based on different topological assumptions and that even better PPI link predictors may be obtained in the future by improved network modeling.