Wei2GO: weighted sequence similarity-based protein function prediction

Mapping Intimacies ◽

10.1101/2020.04.24.059501 ◽

2020 ◽

Author(s):

Maarten J.M.F Reijnders

Keyword(s):

Gene Ontology ◽

Open Source ◽

Protein Function ◽

Large Scale ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Computational Time ◽

Web Servers ◽

Weighted Sequence

AbstractBackgroundProtein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases.MethodsThis paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations.ResultsWei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, Fmax scores, and Smin scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes.AvailabilityWei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO

Download Full-text

Large-scale protein function prediction using heterogeneous ensembles

F1000Research ◽

10.12688/f1000research.16415.1 ◽

2018 ◽

Vol 7 ◽

pp. 1577 ◽

Cited By ~ 4

Author(s):

Linhua Wang ◽

Jeffrey Law ◽

Shiv D. Kale ◽

T. M. Murali ◽

Gaurav Pandey

Keyword(s):

Gene Ontology ◽

Logistic Regression ◽

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Ensemble Methods ◽

Function Prediction ◽

Data Type ◽

Heterogeneous Ensemble ◽

The Ideal

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

Learning Kernel Matrix from Gene Ontology and Annotation Data for Protein Function Prediction

Advances in Neural Networks – ISNN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01513-7_76 ◽

2009 ◽

pp. 694-703

Author(s):

Yiming Chen ◽

Zhoujun Li ◽

Junwan Liu

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Kernel Matrix ◽

Annotation Data

Download Full-text

Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction

Soft Computing ◽

10.1007/s00500-021-06707-z ◽

2022 ◽

Author(s):

Musadaq Mansoor ◽

Mohammad Nauman ◽

Hafeez Ur Rehman ◽

Alfredo Benso

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences

Bioinformatics ◽

10.1093/bioinformatics/bty704 ◽

2018 ◽

Vol 35 (5) ◽

pp. 753-759 ◽

Cited By ~ 8

Author(s):

Aashish Jain ◽

Daisuke Kihara

Keyword(s):

Protein Function ◽

Transfer Functions ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Prediction Method ◽

Query Protein ◽

Function Prediction ◽

Homology Search ◽

Supplementary Information ◽

Phylogenetic Distance

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A large-scale evaluation of computational protein function prediction

Nature Methods ◽

10.1038/nmeth.2340 ◽

2013 ◽

Vol 10 (3) ◽

pp. 221-227 ◽

Cited By ~ 521

Author(s):

Predrag Radivojac ◽

Wyatt T Clark ◽

Tal Ronnen Oron ◽

Alexandra M Schnoes ◽

Tobias Wittkop ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Scale Evaluation

Download Full-text

A Bayesian approach to construct Context-Specific Gene Ontology: Application to protein function prediction

2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2016.7758127 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hasna Njah ◽

Salma Jamoussi ◽

Walid Mahdi ◽

Mohamed Elati

Keyword(s):

Gene Ontology ◽

Bayesian Approach ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Specific Gene ◽

Ontology Application ◽

Context Specific

Download Full-text

INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity

Nucleic Acids Research ◽

10.1093/nar/gkv523 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W134-W140 ◽

Cited By ~ 52

Author(s):

Damiano Piovesan ◽

Manuel Giollo ◽

Emanuela Leonardi ◽

Carlo Ferrari ◽

Silvio C.E. Tosatto

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Interaction Networks

Download Full-text

A look back at the quality of Protein Function Prediction tools in CAFA

10.7287/peerj.preprints.27161 ◽

2018 ◽

Author(s):

Morteza Pourreza Shahri ◽

Madhusudan Srinivasan ◽

Diane Bimczok ◽

Upulee Kanewala ◽

Indika Kahanda

Keyword(s):

Protein Function ◽

Large Scale ◽

Computational Models ◽

Protein Function Prediction ◽

Function Prediction ◽

Test Case ◽

Test Cases ◽

Metamorphic Testing ◽

Main Challenge ◽

Scale Experiment

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

Download Full-text

FUNCTION PREDICTION OF UNCHARACTERIZED PROTEINS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720007002503 ◽

2007 ◽

Vol 05 (01) ◽

pp. 1-30 ◽

Cited By ~ 60

Author(s):

TROY HAWKINS ◽

DAISUKE KIHARA

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Biological Research ◽

Special Focus ◽

Genomic Context ◽

Uncharacterized Protein ◽

Genomics And Proteomics ◽

Proteomics Experiment

Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.

Download Full-text