scholarly journals Integrated Network Approach to Protein Function Prediction

2018 ◽  
Vol 21 ◽  
pp. 98-103
Author(s):  
Natalia Novoselova ◽  
Igar Tom

One of the main problems in functional genomics is the prediction of the unknown gene/protein functions. With the rapid increase of high-throughput technologies, the vast amount of biological data describing different aspects of cellular functioning became available and made it possible to use them as the additional information sources for function prediction and to improve their accuracy.In our research, we have described an approach to protein function prediction on the basis of integration of several biological datasets. Initially, each dataset is presented in the form of a graph (or network), where the nodes represent genes or their products and the edges represent physical, functional or chemical relationships between nodes. The integration process makes it possible to estimate the network importance for the prediction of a particular function taking into account the imbalance between the functional annotations, notably the disproportion between positively and negatively annotated proteins. The protein function prediction consists in applying the label propagation algorithm to the integrated biological network in order to annotate the unknown proteins or determine the new function to already known proteins. The comparative analysis of the prediction efficiency with several integration schemes shows the positive effect in terms of several performance measures. 

Author(s):  
Hon Nian Chua ◽  
Limsoon Wong

Functional characterization of genes and their protein products is essential to biological and clinical research. Yet, there is still no reliable way of assigning functional annotations to proteins in a high-throughput manner. In this article, the authors provide an introduction to the task of automated protein function prediction. They discuss about the motivation for automated protein function prediction, the challenges faced in this task, as well as some approaches that are currently available. In particular, they take a closer look at methods that use protein-protein interaction for protein function prediction, elaborating on their underlying techniques and assumptions, as well as their strengths and limitations.


Author(s):  
Hon Nian Chua ◽  
Limsoon Wong

Functional characterization of genes and their protein products is essential to biological and clinical research. Yet, there is still no reliable way of assigning functional annotations to proteins in a high-throughput manner. In this chapter, the authors provide an introduction to the task of automated protein function prediction. They discuss about the motivation for automated protein function prediction, the challenges faced in this task, as well as some approaches that are currently available. In particular, they take a closer look at methods that use protein-protein interaction for protein function prediction, elaborating on their underlying techniques and assumptions, as well as their strengths and limitations.


2022 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero


2014 ◽  
Vol 2014 ◽  
pp. 1-9
Author(s):  
Jaehee Jung ◽  
Heung Ki Lee ◽  
Gangman Yi

Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Naihui Zhou ◽  
Yuxiang Jiang ◽  
Timothy R. Bergquist ◽  
Alexandra J. Lee ◽  
Balint Z. Kacsoh ◽  
...  

Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


2020 ◽  
Author(s):  
bihai zhao ◽  
Zhihong Zhang ◽  
Meiping Jiang ◽  
Sai Hu ◽  
Yingchun Luo ◽  
...  

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma.Results: We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation.Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Jian-Sheng Wu ◽  
Hai-Feng Hu ◽  
Shan-Cheng Yan ◽  
Li-Hua Tang

Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer fromweak-labelproblem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation.


2013 ◽  
Vol 11 (04) ◽  
pp. 1350008 ◽  
Author(s):  
JINGYU HOU ◽  
YONGQING JIANG

The availability of large amounts of protein–protein interaction (PPI) data makes it feasible to use computational approaches to predict protein functions. The base of existing computational approaches is to exploit the known function information of annotated proteins in the PPI data to predict functions of un-annotated proteins. However, these approaches consider the prediction domain (i.e. the set of proteins from which the functions are predicted) as unchangeable during the prediction procedure. This may lead to valuable information being overwhelmed by the unavoidable noise information in the PPI data when predicting protein functions, and in turn, the prediction results will be distorted. In this paper, we propose a novel method to dynamically predict protein functions from the PPI data. Our method regards the function prediction as a dynamic process of finding a suitable prediction domain, from which representative functions of the domain are selected to predict functions of un-annotated proteins. Our method exploits the topological structural information of a PPI network and the semantic relationship between protein functions to measure the relationship between proteins, dynamically select a suitable prediction domain and predict functions. The evaluation on real PPI datasets demonstrated the effectiveness of our proposed method, and generated better prediction results.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Vladimir Gligorijević ◽  
P. Douglas Renfrew ◽  
Tomasz Kosciolek ◽  
Julia Koehler Leman ◽  
Daniel Berenberg ◽  
...  

AbstractThe rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/.


Sign in / Sign up

Export Citation Format

Share Document