PRER: A Patient Representation with Pairwise Relative Expression of Proteins on Biological Networks
AbstractChanges in protein and gene expression levels are often used as features to predictive models such as survival prediction. A common strategy to aggregate information on individual proteins is to integrate the expression information with biological networks. We propose a novel patient representation in this work where we integrate proteins’ expression levels with the protein-protein interaction (PPI) networks. Patient representation with PRER (Pairwise Relative Expressions with Random walks) uses the neighborhood of a protein to capture the dysregulation patterns in protein abundance. Specifically, PRER computes a feature vector for a patient by comparing the source protein’s protein expression level with other proteins’ levels in its neighborhood. This neighborhood of the source protein is derived using a biased random-walk strategy on the network. We test PRER’s performance through a survival prediction task in 10 different cancers using random forest survival models. PRER representation yields a statistically significant predictive performance in 9 out of 10 cancer types when compared to a representation based on individual protein expression. We also identify important proteins that are not important in the models trained with the expression values but emerge as predictive in models trained with PRER features. The set of identified relations provides a valuable collection of biomarkers with high prognostic value. PRER representation can be used for other complex diseases and prediction tasks that use molecular expression profiles as input. PRER is freely available at: https://github.com/hikuru/PRER