The effect of statistical normalisation on network propagation scores

AbstractMotivationNetwork diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.ResultsDiffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities.AvailabilityThe code is publicly available at https://github.com/b2slab/[email protected]

Download Full-text

The effect of statistical normalisation on network propagation scores

Bioinformatics ◽

10.1093/bioinformatics/btaa896 ◽

2020 ◽

Author(s):

Sergio Picart-Armada ◽

Wesley K Thompson ◽

Alfonso Buil ◽

Alexandre Perera-Lluna

Keyword(s):

Protein Function ◽

Diffusion Processes ◽

Protein Function Prediction ◽

Interaction Network ◽

Mean Value ◽

Statistical Properties ◽

Label Propagation ◽

Supplementary Information ◽

Module Discovery ◽

Permutation Analysis

Abstract Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein Function Prediction by Clustering of Protein-Protein Interaction Network

Advances in Intelligent and Soft Computing - ICT Innovations 2011 ◽

10.1007/978-3-642-28664-3_4 ◽

2012 ◽

pp. 39-49 ◽

Cited By ~ 1

Author(s):

Ivana Cingovska ◽

Aleksandra Bogojeska ◽

Kire Trivodaliev ◽

Slobodan Kalajdziski

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Protein function prediction using neighbor relativity in protein–protein interaction network

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2012.12.003 ◽

2013 ◽

Vol 43 ◽

pp. 11-16 ◽

Cited By ~ 16

Author(s):

Sobhan Moosavi ◽

Masoud Rahgozar ◽

Amir Rahimi

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

COMPUTATIONAL METHOD FOR PROTEIN FUNCTION PREDICTION BY CONSTRUCTING PROTEIN INTERACTION NETWORK DICTIONARY

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004661 ◽

2006 ◽

Vol 20 (02) ◽

pp. 285-295 ◽

Cited By ~ 2

Author(s):

HEE-JEONG JIN ◽

HWAN-GUE CHO

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Computational Method ◽

Chi Square ◽

Protein Protein Interaction ◽

Protein Functions ◽

Novel Method

In the post-genomic era, predicting protein function is a challenging problem. It is difficult and burdensome work to unravel the functions of a protein by wet experiments only. In this paper, we propose a novel method to predict protein functions by building a "Protein Interaction Network Dictionary (PIND)". This method deduces the protein functions by searching the most similar "words"(an anagram of functions in neighbor proteins on a protein–protein interaction graph) using global alignments. An evaluation of sensitivity and specificity shows that this PIND approach outperforms previous approaches such as Majority Rule and Chi-Square measure, and that it competes with the recently introduced Random Markov Model approach.

Download Full-text

Protein Function Prediction Using Function Associations in Protein–Protein Interaction Network

IEEE Access ◽

10.1109/access.2018.2806478 ◽

2018 ◽

Vol 6 ◽

pp. 30892-30902 ◽

Cited By ~ 3

Author(s):

Pingping Sun ◽

Xian Tan ◽

Sijia Guo ◽

Jingbo Zhang ◽

Bojian Sun ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Protein Function Prediction Using Neighbor Counting with Dynamic Threshold from Protein-Protein Interaction Network

Computational Biology and Bioinformatics ◽

10.11648/j.cbb.20150301.11 ◽

2015 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

Md. Khaled Ben Islam

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Dynamic Threshold ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Target Protein Function Prediction by Identification of Essential Proteins in Protein-Protein Interaction Network

Communications in Computer and Information Science - Computational Intelligence, Communications, and Business Analytics ◽

10.1007/978-981-13-8581-0_18 ◽

2019 ◽

pp. 219-231

Author(s):

Soukhindra Nath Basak ◽

Ankur Kumar Biswas ◽

Sovan Saha ◽

Piyali Chatterjee ◽

Subhadip Basu ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Target Protein ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Protein function prediction from protein–protein interaction network using gene ontology based neighborhood analysis and physico-chemical features

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018500257 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1850025 ◽

Cited By ~ 5

Author(s):

Sovan Saha ◽

Abhimanyu Prasad ◽

Piyali Chatterjee ◽

Subhadip Basu ◽

Mita Nasipuri

Keyword(s):

Functional Groups ◽

Protein Function ◽

Functional Group ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Physico Chemical ◽

Go Terms ◽

Protein Protein Interaction Network

Protein Function Prediction from Protein–Protein Interaction Network (PPIN) and physico-chemical features using the Gene Ontology (GO) classification are indeed very useful for assigning biological or biochemical functions to a protein. They also lead to the identification of those significant proteins which are responsible for the generation of various diseases whose drugs are still yet to be discovered. So, the prediction of GO functional terms from PPIN and sequence is an important field of study. In this work, we have proposed a methodology, Multi Label Protein Function Prediction (ML_PFP) which is based on Neighborhood analysis empowered with physico-chemical features of constituent amino acids to predict the functional group of unannotated protein. A protein does not perform functions in isolation rather it performs functions in a group by interacting with others. So a protein is involved in many functions or, in other words, may be associated with multiple functional groups or labels or GO terms. Though functional group of other known interacting partner protein and its physico-chemical features provide useful information, assignment of multiple labels to unannotated protein is a very challenging task. Here, we have taken Homo sapiens or Human PPIN as well as Saccharomyces cerevisiae or yeast PPIN along with their GO terms to predict functional groups or GO terms of unannotated proteins. This work has become very challenging as both Human and Yeast protein dataset are voluminous and complex in nature and multi-label functional groups assignment has also added a new dimension to this challenge. Our algorithm has been observed to achieve a better performance in Cellular Function, Molecular Function and Biological Process of both yeast and human network when compared with the other existing state-of-the-art methodologies which will be discussed in detail in the results section.

Download Full-text

Gene co-expression in the interactome: moving from correlation toward causation via an integrated approach to disease module discovery

npj Systems Biology and Applications ◽

10.1038/s41540-020-00168-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Paola Paci ◽

Giulia Fiscon ◽

Federica Conte ◽

Rui-Sheng Wang ◽

Lorenzo Farina ◽

...

Keyword(s):

Network Analysis ◽

Interaction Network ◽

Integrated Approach ◽

State Transitions ◽

Disease Genes ◽

Human Interactome ◽

Protein Protein Interaction ◽

Interactome Network ◽

Module Discovery ◽

Specific Disorders

AbstractIn this study, we integrate the outcomes of co-expression network analysis with the human interactome network to predict novel putative disease genes and modules. We first apply the SWItch Miner (SWIM) methodology, which predicts important (switch) genes within the co-expression network that regulate disease state transitions, then map them to the human protein–protein interaction network (PPI, or interactome) to predict novel disease–disease relationships (i.e., a SWIM-informed diseasome). Although the relevance of switch genes to an observed phenotype has been recently assessed, their performance at the system or network level constitutes a new, potentially fascinating territory yet to be explored. Quantifying the interplay between switch genes and human diseases in the interactome network, we found that switch genes associated with specific disorders are closer to each other than to other nodes in the network, and tend to form localized connected subnetworks. These subnetworks overlap between similar diseases and are situated in different neighborhoods for pathologically distinct phenotypes, consistent with the well-known topological proximity property of disease genes. These findings allow us to demonstrate how SWIM-based correlation network analysis can serve as a useful tool for efficient screening of potentially new disease gene associations. When integrated with an interactome-based network analysis, it not only identifies novel candidate disease genes, but also may offer testable hypotheses by which to elucidate the molecular underpinnings of human disease and reveal commonalities between seemingly unrelated diseases.

Download Full-text