scholarly journals Gene Function Prediction in Five Model Eukaryotes Exclusively Based on Gene Relative Location Through Machine Learning

Author(s):  
Flavio Pazos Obregón ◽  
Diego Silvera ◽  
Pablo Soto ◽  
Patricio Yankilevich ◽  
Gustavo Guerberoff ◽  
...  

Abstract The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.

2021 ◽  
Author(s):  
Flavio Pazos Obregón ◽  
Diego Silvera ◽  
Pablo Soto ◽  
Patricio Yankilevich ◽  
Gustavo Guerberoff ◽  
...  

Motiviation: The function of most genes is unknown. The best results in gene function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location relay on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Results: Here we predict thousands of gene functions in five eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models trained with features derived from the location of genes in the genomes to which they belong. To the best of our knowledge this is the first work in which gene function prediction is successfully achieved in eukaryotic genomes using predictive features derived exclusively from the relative location of the genes. Contact: [email protected] Supplementary information: http://gfpml.bnd.edu.uy


2019 ◽  
Author(s):  
Xiuru Dai ◽  
Zheng Xu ◽  
Zhikai Liang ◽  
Xiaoyu Tu ◽  
Silin Zhong ◽  
...  

AbstractAdvances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non-homology gene features. Among the eight supervised classification algorithms evaluated, random forest-based prediction consistently provided the most accurate gene function prediction. Non-homology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance in Biological Process GO terms, the domain where homology-based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology-based functional annotation is highest. Non-homology-based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology-based functional annotations.


Sign in / Sign up

Export Citation Format

Share Document