A look back at the quality of Protein Function Prediction tools in CAFA

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

Download Full-text

De-novo protein function prediction using DNA binding and RNA binding proteins as a test case

Nature Communications ◽

10.1038/ncomms13424 ◽

2016 ◽

Vol 7 (1) ◽

Cited By ~ 9

Author(s):

Sapir Peled ◽

Olga Leiderman ◽

Rotem Charar ◽

Gilat Efroni ◽

Yaron Shav-Tal ◽

...

Keyword(s):

Dna Binding ◽

Protein Function ◽

Binding Proteins ◽

Rna Binding ◽

De Novo ◽

Rna Binding Proteins ◽

Protein Function Prediction ◽

Function Prediction ◽

Test Case

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools

2019 IEEE International Conference On Artificial Intelligence Testing (AITest) ◽

10.1109/aitest.2019.00017 ◽

2019 ◽

Cited By ~ 2

Author(s):

Morteza Pourreza Shahri ◽

Madhusudan Srinivasan ◽

Gillian Reynolds ◽

Diane Bimczok ◽

Indika Kahanda ◽

...

Keyword(s):

Quality Assurance ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Metamorphic Testing ◽

Prediction Tools

Download Full-text

A large-scale evaluation of computational protein function prediction

Nature Methods ◽

10.1038/nmeth.2340 ◽

2013 ◽

Vol 10 (3) ◽

pp. 221-227 ◽

Cited By ~ 521

Author(s):

Predrag Radivojac ◽

Wyatt T Clark ◽

Tal Ronnen Oron ◽

Alexandra M Schnoes ◽

Tobias Wittkop ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Scale Evaluation

Download Full-text

Large-scale protein function prediction using heterogeneous ensembles

F1000Research ◽

10.12688/f1000research.16415.1 ◽

2018 ◽

Vol 7 ◽

pp. 1577 ◽

Cited By ~ 4

Author(s):

Linhua Wang ◽

Jeffrey Law ◽

Shiv D. Kale ◽

T. M. Murali ◽

Gaurav Pandey

Keyword(s):

Gene Ontology ◽

Logistic Regression ◽

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Ensemble Methods ◽

Function Prediction ◽

Data Type ◽

Heterogeneous Ensemble ◽

The Ideal

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).

Download Full-text

Computational Models or Methods for Protein Function Prediction

Current Proteomics ◽

10.2174/157016461605190510114117 ◽

2019 ◽

Vol 16 (5) ◽

pp. 352-353

Author(s):

Guohua Huang

Keyword(s):

Protein Function ◽

Computational Models ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank

Bioinformatics ◽

10.1093/bioinformatics/bty130 ◽

2018 ◽

Vol 34 (14) ◽

pp. 2465-2473 ◽

Cited By ~ 40

Author(s):

Ronghui You ◽

Zihan Zhang ◽

Yi Xiong ◽

Fengzhu Sun ◽

Hiroshi Mamitsuka ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Learning To Rank ◽

Function Prediction

Download Full-text

Wei2GO: weighted sequence similarity-based protein function prediction

10.1101/2020.04.24.059501 ◽

2020 ◽

Author(s):

Maarten J.M.F Reijnders

Keyword(s):

Gene Ontology ◽

Open Source ◽

Protein Function ◽

Large Scale ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Computational Time ◽

Web Servers ◽

Weighted Sequence

AbstractBackgroundProtein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases.MethodsThis paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations.ResultsWei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, Fmax scores, and Smin scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes.AvailabilityWei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO

Download Full-text

Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks

10.1101/499244 ◽

2018 ◽

Author(s):

Cen Wan ◽

Domenico Cozzetto ◽

Rui Fa ◽

David T. Jones

Keyword(s):

Neural Networks ◽

Protein Interaction ◽

Protein Interactions ◽

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Network Embedding ◽

Protein Protein Interactions ◽

Functional Representations

Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other network embedding-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition.

Download Full-text