Large-scale protein function prediction using heterogeneous ensembles

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).

Download Full-text

Wei2GO: weighted sequence similarity-based protein function prediction

10.1101/2020.04.24.059501 ◽

2020 ◽

Author(s):

Maarten J.M.F Reijnders

Keyword(s):

Gene Ontology ◽

Open Source ◽

Protein Function ◽

Large Scale ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Computational Time ◽

Web Servers ◽

Weighted Sequence

AbstractBackgroundProtein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases.MethodsThis paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations.ResultsWei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, Fmax scores, and Smin scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes.AvailabilityWei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

Learning Kernel Matrix from Gene Ontology and Annotation Data for Protein Function Prediction

Advances in Neural Networks – ISNN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01513-7_76 ◽

2009 ◽

pp. 694-703

Author(s):

Yiming Chen ◽

Zhoujun Li ◽

Junwan Liu

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Kernel Matrix ◽

Annotation Data

Download Full-text

Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction

Soft Computing ◽

10.1007/s00500-021-06707-z ◽

2022 ◽

Author(s):

Musadaq Mansoor ◽

Mohammad Nauman ◽

Hafeez Ur Rehman ◽

Alfredo Benso

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

A large-scale evaluation of computational protein function prediction

Nature Methods ◽

10.1038/nmeth.2340 ◽

2013 ◽

Vol 10 (3) ◽

pp. 221-227 ◽

Cited By ~ 521

Author(s):

Predrag Radivojac ◽

Wyatt T Clark ◽

Tal Ronnen Oron ◽

Alexandra M Schnoes ◽

Tobias Wittkop ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Scale Evaluation

Download Full-text

Hierarchical Ensemble Methods for Protein Function Prediction

ISRN Bioinformatics ◽

10.1155/2014/901419 ◽

2014 ◽

Vol 2014 ◽

pp. 1-34 ◽

Cited By ~ 22

Author(s):

Giorgio Valentini

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Ensemble Methods ◽

Function Prediction ◽

Future Research ◽

Prediction Methods ◽

Multiple Sources ◽

Open Problems ◽

Functional Classes ◽

Hierarchical Relationships

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research.

Download Full-text

Diffusion Kernel-Based Logistic Regression Models for Protein Function Prediction

OMICS A Journal of Integrative Biology ◽

10.1089/omi.2006.10.40 ◽

2006 ◽

Vol 10 (1) ◽

pp. 40-55 ◽

Cited By ~ 65

Author(s):

Hyunju Lee ◽

Zhidong Tu ◽

Minghua Deng ◽

Fengzhu Sun ◽

Ting Chen

Keyword(s):

Logistic Regression ◽

Protein Function ◽

Regression Models ◽

Protein Function Prediction ◽

Function Prediction ◽

Diffusion Kernel ◽

Logistic Regression Models

Download Full-text

A Bayesian approach to construct Context-Specific Gene Ontology: Application to protein function prediction

2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2016.7758127 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hasna Njah ◽

Salma Jamoussi ◽

Walid Mahdi ◽

Mohamed Elati

Keyword(s):

Gene Ontology ◽

Bayesian Approach ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Specific Gene ◽

Ontology Application ◽

Context Specific

Download Full-text

A look back at the quality of Protein Function Prediction tools in CAFA

10.7287/peerj.preprints.27161 ◽

2018 ◽

Author(s):

Morteza Pourreza Shahri ◽

Madhusudan Srinivasan ◽

Diane Bimczok ◽

Upulee Kanewala ◽

Indika Kahanda

Keyword(s):

Protein Function ◽

Large Scale ◽

Computational Models ◽

Protein Function Prediction ◽

Function Prediction ◽

Test Case ◽

Test Cases ◽

Metamorphic Testing ◽

Main Challenge ◽

Scale Experiment

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

Download Full-text