Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research.

Download Full-text

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Genome Biology ◽

10.1186/s13059-016-1037-6 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 180

Author(s):

Yuxiang Jiang ◽

Tal Ronnen Oron ◽

Wyatt T. Clark ◽

Asma R. Bankapur ◽

Daniel D’Andrea ◽

...

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Prediction Methods

Download Full-text

Plant miRNA function prediction based on functional similarity network and transductive multi-label classification algorithm

Neurocomputing ◽

10.1016/j.neucom.2015.12.011 ◽

2016 ◽

Vol 179 ◽

pp. 283-289 ◽

Cited By ~ 9

Author(s):

Jun Meng ◽

Guan-Li Shi ◽

Yu-Shi Luan

Keyword(s):

Function Prediction ◽

Classification Algorithm ◽

Functional Similarity ◽

Similarity Network ◽

Plant Mirna

Download Full-text

NPF：Network propagation for protein function prediction

10.21203/rs.3.rs-16452/v2 ◽

2020 ◽

Author(s):

Bihai Zhao ◽

Zhihong Zhang ◽

Meiping Jiang ◽

Sai Hu ◽

Yingchun Luo ◽

...

Keyword(s):

Protein Interaction ◽

Protein Function ◽

Cross Validation ◽

Function Prediction ◽

Protein Interaction Networks ◽

Functional Similarity ◽

Interaction Networks ◽

Omics Data ◽

Protein Functions ◽

Network Propagation

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.

Download Full-text

NPF：Network propagation for protein function prediction

10.21203/rs.3.rs-16452/v1 ◽

2020 ◽

Author(s):

bihai zhao ◽

Zhihong Zhang ◽

Meiping Jiang ◽

Sai Hu ◽

Yingchun Luo ◽

...

Keyword(s):

Protein Interaction ◽

Protein Function ◽

Function Prediction ◽

Biological Data ◽

Protein Interaction Networks ◽

Functional Similarity ◽

Interaction Networks ◽

Omics Data ◽

Protein Functions ◽

Network Propagation

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma.Results: We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation.Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.

Download Full-text

Extensive complementarity between gene function prediction methods

Bioinformatics ◽

10.1093/bioinformatics/btw532 ◽

2016 ◽

pp. btw532 ◽

Cited By ~ 2

Author(s):

Vedrana Vidulin ◽

Tomislav Šmuc ◽

Fran Supek

Keyword(s):

Gene Function ◽

Function Prediction ◽

Prediction Methods ◽

Gene Function Prediction

Download Full-text

fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks

Nucleic Acids Research ◽

10.1093/nar/gkx1060 ◽

2017 ◽

Vol 46 (D1) ◽

pp. D535-D541 ◽

Cited By ~ 8

Author(s):

Chengsheng Zhu ◽

Yannick Mahlich ◽

Maximilian Miller ◽

Yana Bromberg

Keyword(s):

Microbial Diversity ◽

Functional Similarity ◽

Environmental Preferences ◽

Similarity Networks

Download Full-text

Improving protein function prediction with synthetic feature samples created by generative adversarial networks

10.1101/730143 ◽

2019 ◽

Author(s):

Cen Wan ◽

David T. Jones

Keyword(s):

Protein Function ◽

Data Augmentation ◽

Protein Function Prediction ◽

Function Prediction ◽

Generative Adversarial Networks ◽

Prediction Methods ◽

High Quality ◽

Adversarial Networks ◽

Synthetic Protein ◽

Protein Feature

AbstractProtein function prediction is a challenging but important task in bioinformatics. Many prediction methods have been developed, but are still limited by the bottleneck on training sample quantity. Therefore, it is valuable to develop a data augmentation method that can generate high-quality synthetic samples to further improve the accuracy of prediction methods. In this work, we propose a novel generative adversarial networks-based method, namely FFPred-GAN, to accurately learn the high-dimensional distributions of protein sequence-based biophysical features and also generate high-quality synthetic protein feature samples. The experimental results suggest that the synthetic protein feature samples are successful in improving the prediction accuracy for all three domains of the Gene Ontology through augmentation of the original training protein feature samples.

Download Full-text