scholarly journals Protein Function Prediction Based on PPI Networks: Network Reconstruction vs Edge Enrichment

2021 ◽  
Vol 12 ◽  
Author(s):  
Jiaogen Zhou ◽  
Wei Xiong ◽  
Yang Wang ◽  
Jihong Guan

Over the past decades, massive amounts of protein-protein interaction (PPI) data have been accumulated due to the advancement of high-throughput technologies, and but data quality issues (noise or incompleteness) of PPI have been still affecting protein function prediction accuracy based on PPI networks. Although two main strategies of network reconstruction and edge enrichment have been reported on the effectiveness of boosting the prediction performance in numerous literature studies, there still lack comparative studies of the performance differences between network reconstruction and edge enrichment. Inspired by the question, this study first uses three protein similarity metrics (local, global and sequence) for network reconstruction and edge enrichment in PPI networks, and then evaluates the performance differences of network reconstruction, edge enrichment and the original networks on two real PPI datasets. The experimental results demonstrate that edge enrichment work better than both network reconstruction and original networks. Moreover, for the edge enrichment of PPI networks, the sequence similarity outperformes both local and global similarity. In summary, our study can help biologists select suitable pre-processing schemes and achieve better protein function prediction for PPI networks.

Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Abstract Motivation Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. Results We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. Availability and implementation http://deepgoplus.bio2vec.net/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

ABSTRACTProtein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time.We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins. We evaluate the performance of DeepGOPlus on the CAFA3 dataset and significantly improve the performance of predictions of biological processes and cellular components with Fmax of 0.47 and 0.70, respectively, using only the amino acid sequence of proteins as input. DeepGOPlus can annotate around 40 protein sequences per second, thereby making fast and accurate function predictions available for a wide range of proteins.


2020 ◽  
Author(s):  
Meet Barot ◽  
Vladimir Gligorijevic ◽  
Kyunghyun Cho ◽  
Richard Bonneau

Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to proteome and biological network functional annotation use sequence similarity to transfer knowledge between species. These similarity-based approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular or organismal context for meaningful function prediction. In order to supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, the majority of these methods are tied to a network for a single species, and many species lack biological networks. In this work, we integrate sequence and network information across multiple species by applying an IsoRank-derived network alignment algorithm to create a meta-network profile of the proteins of multiple species. We then use this integrated multispecies meta-network as input features to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and more diverse examples from multiple organisms, and consequently leads to significant improvements in function prediction performance. Further, we evaluate our approach in a setting in which an organism's PPI network is left out, using other organisms' network information and sequence homology in order to make predictions for the left-out organism, to simulate cases in which a newly sequenced species has no network information available.


2018 ◽  
Vol 35 (5) ◽  
pp. 753-759 ◽  
Author(s):  
Aashish Jain ◽  
Daisuke Kihara

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 43 (W1) ◽  
pp. W134-W140 ◽  
Author(s):  
Damiano Piovesan ◽  
Manuel Giollo ◽  
Emanuela Leonardi ◽  
Carlo Ferrari ◽  
Silvio C.E. Tosatto

2007 ◽  
Vol 05 (01) ◽  
pp. 1-30 ◽  
Author(s):  
TROY HAWKINS ◽  
DAISUKE KIHARA

Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.


2014 ◽  
Vol 8 (1) ◽  
pp. 35 ◽  
Author(s):  
Wei Peng ◽  
Jianxin Wang ◽  
Juan Cai ◽  
Lu Chen ◽  
Min Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document