The effect of statistical normalisation on network propagation scores

Bioinformatics ◽

10.1093/bioinformatics/btaa896 ◽

2020 ◽

Author(s):

Sergio Picart-Armada ◽

Wesley K Thompson ◽

Alfonso Buil ◽

Alexandre Perera-Lluna

Keyword(s):

Protein Function ◽

Diffusion Processes ◽

Protein Function Prediction ◽

Interaction Network ◽

Mean Value ◽

Statistical Properties ◽

Label Propagation ◽

Supplementary Information ◽

Module Discovery ◽

Permutation Analysis

Abstract Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The effect of statistical normalisation on network propagation scores

10.1101/2020.01.20.911842 ◽

2020 ◽

Author(s):

Sergio Picart-Armada ◽

Wesley K. Thompson ◽

Alfonso Buil ◽

Alexandre Perera-Lluna

Keyword(s):

Protein Function ◽

Diffusion Processes ◽

Protein Function Prediction ◽

Interaction Network ◽

Mean Value ◽

Statistical Properties ◽

Label Propagation ◽

Protein Protein Interaction ◽

Module Discovery ◽

Permutation Analysis

AbstractMotivationNetwork diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.ResultsDiffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities.AvailabilityThe code is publicly available at https://github.com/b2slab/[email protected]

Download Full-text

Protein Function Prediction by Clustering of Protein-Protein Interaction Network

Advances in Intelligent and Soft Computing - ICT Innovations 2011 ◽

10.1007/978-3-642-28664-3_4 ◽

2012 ◽

pp. 39-49 ◽

Cited By ~ 1

Author(s):

Ivana Cingovska ◽

Aleksandra Bogojeska ◽

Kire Trivodaliev ◽

Slobodan Kalajdziski

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

Bioinformatics ◽

10.1093/bioinformatics/btaa701 ◽

2020 ◽

Cited By ~ 1

Author(s):

Amelia Villegas-Morcillo ◽

Stavros Makrodimitris ◽

Roeland C H J van Ham ◽

Angel M Gomez ◽

Victoria Sanchez ◽

...

Keyword(s):

Protein Function ◽

Prediction Models ◽

Protein Function Prediction ◽

3D Structure ◽

Function Prediction ◽

Feature Representation ◽

Training Data ◽

Supplementary Information ◽

Molecular Function ◽

Structure Information

Abstract Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Availability and implementation Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences

Bioinformatics ◽

10.1093/bioinformatics/bty704 ◽

2018 ◽

Vol 35 (5) ◽

pp. 753-759 ◽

Cited By ~ 8

Author(s):

Aashish Jain ◽

Daisuke Kihara

Keyword(s):

Protein Function ◽

Transfer Functions ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Prediction Method ◽

Query Protein ◽

Function Prediction ◽

Homology Search ◽

Supplementary Information ◽

Phylogenetic Distance

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein function prediction using neighbor relativity in protein–protein interaction network

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2012.12.003 ◽

2013 ◽

Vol 43 ◽

pp. 11-16 ◽

Cited By ~ 16

Author(s):

Sobhan Moosavi ◽

Masoud Rahgozar ◽

Amir Rahimi

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

Peer Review #2 of "FunPred 3.0: improved protein function prediction using protein interaction network (v0.3)"

10.7287/peerj.6830v0.3/reviews/2 ◽

2019 ◽

Keyword(s):

Peer Review ◽

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction

Download Full-text

COMPUTATIONAL METHOD FOR PROTEIN FUNCTION PREDICTION BY CONSTRUCTING PROTEIN INTERACTION NETWORK DICTIONARY

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004661 ◽

2006 ◽

Vol 20 (02) ◽

pp. 285-295 ◽

Cited By ~ 2

Author(s):

HEE-JEONG JIN ◽

HWAN-GUE CHO

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Computational Method ◽

Chi Square ◽

Protein Protein Interaction ◽

Protein Functions ◽

Novel Method

In the post-genomic era, predicting protein function is a challenging problem. It is difficult and burdensome work to unravel the functions of a protein by wet experiments only. In this paper, we propose a novel method to predict protein functions by building a "Protein Interaction Network Dictionary (PIND)". This method deduces the protein functions by searching the most similar "words"(an anagram of functions in neighbor proteins on a protein–protein interaction graph) using global alignments. An evaluation of sensitivity and specificity shows that this PIND approach outperforms previous approaches such as Majority Rule and Chi-Square measure, and that it competes with the recently introduced Random Markov Model approach.

Download Full-text

FunPred 3.0: improved protein function prediction using protein interaction network

PeerJ ◽

10.7717/peerj.6830 ◽

2019 ◽

Vol 7 ◽

pp. e6830 ◽

Cited By ~ 1

Author(s):

Sovan Saha ◽

Piyali Chatterjee ◽

Subhadip Basu ◽

Mita Nasipuri ◽

Dariusz Plewczynski

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Experimental Studies ◽

Interaction Network ◽

Function Prediction ◽

The Self ◽

Functional Annotations

Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.

Download Full-text

Peer Review #2 of "FunPred 3.0: improved protein function prediction using protein interaction network (v0.4)"

10.7287/peerj.6830v0.4/reviews/2 ◽

2019 ◽

Keyword(s):

Peer Review ◽

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction

Download Full-text

A Survey of Computational Intelligence Techniques in Protein Function Prediction

International Journal of Proteomics ◽

10.1155/2014/845479 ◽

2014 ◽

Vol 2014 ◽

pp. 1-22 ◽

Cited By ~ 22

Author(s):

Arvind Kumar Tiwari ◽

Rajeev Srivastava

Keyword(s):

Gene Expression ◽

Computational Intelligence ◽

Protein Function ◽

Rna Binding ◽

Protein Function Prediction ◽

Interaction Network ◽

Heterogeneous Data ◽

Function Prediction ◽

Ensemble Classifiers ◽

The Past

During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.

Download Full-text