scholarly journals DeepGOPlus: improved protein function prediction from sequence

Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Abstract Motivation Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. Results We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. Availability and implementation http://deepgoplus.bio2vec.net/. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

ABSTRACTProtein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time.We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins. We evaluate the performance of DeepGOPlus on the CAFA3 dataset and significantly improve the performance of predictions of biological processes and cellular components with Fmax of 0.47 and 0.70, respectively, using only the amino acid sequence of proteins as input. DeepGOPlus can annotate around 40 protein sequences per second, thereby making fast and accurate function predictions available for a wide range of proteins.


2013 ◽  
Vol 11 (04) ◽  
pp. 1350008 ◽  
Author(s):  
JINGYU HOU ◽  
YONGQING JIANG

The availability of large amounts of protein–protein interaction (PPI) data makes it feasible to use computational approaches to predict protein functions. The base of existing computational approaches is to exploit the known function information of annotated proteins in the PPI data to predict functions of un-annotated proteins. However, these approaches consider the prediction domain (i.e. the set of proteins from which the functions are predicted) as unchangeable during the prediction procedure. This may lead to valuable information being overwhelmed by the unavoidable noise information in the PPI data when predicting protein functions, and in turn, the prediction results will be distorted. In this paper, we propose a novel method to dynamically predict protein functions from the PPI data. Our method regards the function prediction as a dynamic process of finding a suitable prediction domain, from which representative functions of the domain are selected to predict functions of un-annotated proteins. Our method exploits the topological structural information of a PPI network and the semantic relationship between protein functions to measure the relationship between proteins, dynamically select a suitable prediction domain and predict functions. The evaluation on real PPI datasets demonstrated the effectiveness of our proposed method, and generated better prediction results.


2018 ◽  
Vol 35 (5) ◽  
pp. 753-759 ◽  
Author(s):  
Aashish Jain ◽  
Daisuke Kihara

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
HEE-JEONG JIN ◽  
HWAN-GUE CHO

In the post-genomic era, predicting protein function is a challenging problem. It is difficult and burdensome work to unravel the functions of a protein by wet experiments only. In this paper, we propose a novel method to predict protein functions by building a "Protein Interaction Network Dictionary (PIND)". This method deduces the protein functions by searching the most similar "words"(an anagram of functions in neighbor proteins on a protein–protein interaction graph) using global alignments. An evaluation of sensitivity and specificity shows that this PIND approach outperforms previous approaches such as Majority Rule and Chi-Square measure, and that it competes with the recently introduced Random Markov Model approach.


Author(s):  
Hon Nian Chua ◽  
Limsoon Wong

Functional characterization of genes and their protein products is essential to biological and clinical research. Yet, there is still no reliable way of assigning functional annotations to proteins in a high-throughput manner. In this article, the authors provide an introduction to the task of automated protein function prediction. They discuss about the motivation for automated protein function prediction, the challenges faced in this task, as well as some approaches that are currently available. In particular, they take a closer look at methods that use protein-protein interaction for protein function prediction, elaborating on their underlying techniques and assumptions, as well as their strengths and limitations.


Author(s):  
Hon Nian Chua ◽  
Limsoon Wong

Functional characterization of genes and their protein products is essential to biological and clinical research. Yet, there is still no reliable way of assigning functional annotations to proteins in a high-throughput manner. In this chapter, the authors provide an introduction to the task of automated protein function prediction. They discuss about the motivation for automated protein function prediction, the challenges faced in this task, as well as some approaches that are currently available. In particular, they take a closer look at methods that use protein-protein interaction for protein function prediction, elaborating on their underlying techniques and assumptions, as well as their strengths and limitations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jiaogen Zhou ◽  
Wei Xiong ◽  
Yang Wang ◽  
Jihong Guan

Over the past decades, massive amounts of protein-protein interaction (PPI) data have been accumulated due to the advancement of high-throughput technologies, and but data quality issues (noise or incompleteness) of PPI have been still affecting protein function prediction accuracy based on PPI networks. Although two main strategies of network reconstruction and edge enrichment have been reported on the effectiveness of boosting the prediction performance in numerous literature studies, there still lack comparative studies of the performance differences between network reconstruction and edge enrichment. Inspired by the question, this study first uses three protein similarity metrics (local, global and sequence) for network reconstruction and edge enrichment in PPI networks, and then evaluates the performance differences of network reconstruction, edge enrichment and the original networks on two real PPI datasets. The experimental results demonstrate that edge enrichment work better than both network reconstruction and original networks. Moreover, for the edge enrichment of PPI networks, the sequence similarity outperformes both local and global similarity. In summary, our study can help biologists select suitable pre-processing schemes and achieve better protein function prediction for PPI networks.


2022 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero


Sign in / Sign up

Export Citation Format

Share Document