Abstract
Despite technological advances in proteomics, incomplete coverage and inconsistency issues persist, resulting in “data holes”. These data holes cause the missing protein problem (MPP), where relevant proteins are persistently unobserved, or sporadically observed across samples. This hinders biomarker and drug discovery from proteomics data. Network-based approaches are powerful: The Functional Class Scoring (FCS) method using protein complexes was able to easily recover missed proteins with weak or partial support. However, there are limitations: The verification approach (in determining missing protein recovery) is potentially biased as the test data was based on relatively outdated Data-Dependent Acquisition (DDA) proteomics and FCS does not provide a scoring scheme for individual protein components (in significant complexes). To address these issues: First, we devised a more rigorous evaluation of FCS based on same-sample technical replicates. And second, we evaluate using data from more recent Data-Independent Acquisition (DIA) technologies (viz. SWATH).Although cross-replicate examination reveals some inconsistencies amongst same-class samples, tissue-differentiating signal is nonetheless strongly conserved. This confirms FCS as a viable method that selects biologically meaningful networks. We also report that predicted missing proteins are statistically significant based on FCS p-values. Although cross-replicate verification rates are not spectacular, the predicted missing proteins as a whole, have higher peptide support than non-predicted proteins. FCS also has the capacity to predict missing proteins that are often lost due to weak specific peptide support. As a yet unresolved limitation, we find that FCS cannot assign meaningful probabilities to individual protein components (no relationship between actual probability of verification and FCS-assigned probability) as it only provides a p-value at the level of complexes.