Graph-based sequence annotation using a data integration approach

SummaryThe automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

Download Full-text

Noise tolerance of Multiple Classifier Systems in data integration-based gene function prediction

Journal of Integrative Bioinformatics ◽

10.1515/jib-2010-139 ◽

2010 ◽

Vol 7 (3) ◽

pp. 346-362

Author(s):

Matteo Rè ◽

Giorgio Valentini

Keyword(s):

Data Integration ◽

Gene Function ◽

Noisy Data ◽

Function Prediction ◽

System Level ◽

Multiple Classifier Systems ◽

Classifier Systems ◽

Gene Function Prediction ◽

Multiple Classifier ◽

The Impact

Summary The availability of various high-throughput experimental and computational methods developed in the last decade allowed molecular biologists to investigate the functions of genes at system level opening unprecedented research opportunities. Despite the automated prediction of genes functions could be included in the most difficult problems in bioinformatics, several recently published works showed that consistent improvements in prediction performances can be obtained by integrating heterogeneous data sources. Nevertheless, very few works have been dedicated to the investigation of the impact of noisy data on the prediction performances achievable by using data integration approaches.In this contribution we investigated the tolerance of multiple classifier systems (MCS) to noisy data in gene function prediction experiments based on data integration methods. The experimental results show that performances of MCS do not undergo a significant decay when noisy data sets are added. In addition, we show that in this task MCS are competitive with kernel fusion, one of the most widely applied technique for data integration in gene function prediction problems.

Download Full-text

Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning

RSC Advances ◽

10.1039/c8ra05122d ◽

2018 ◽

Vol 8 (50) ◽

pp. 28503-28509 ◽

Cited By ~ 3

Author(s):

Zejun Li ◽

Bo Liao ◽

Yun Li ◽

Wenhua Liu ◽

Min Chen ◽

...

Keyword(s):

Gene Ontology ◽

Gene Function ◽

Genome Annotation ◽

Function Prediction ◽

Gene Function Prediction ◽

Function Annotation ◽

P Gene ◽

Main Challenge ◽

Gene Function Annotation

Gene function annotation is the main challenge in the post genome era, which is an important part of the genome annotation.

Download Full-text

Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana

10.1101/181396 ◽

2017 ◽

Author(s):

Bjoern Oest Hansen ◽

Etienne H. Meyer ◽

Camilla Ferrari ◽

Neha Vaid ◽

Sara Movahedi ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Gene Function ◽

Complex I ◽

Prediction Method ◽

Function Prediction ◽

Mitochondrial Complex ◽

Gene Function Prediction ◽

Inference Methods ◽

User Friendly

Despite increasing availability of sequenced genomes, accurate characterization of gene functions is needed to close the genotype-phenotype gap. Recent advances in gene function prediction rely on ensemble approaches that integrate the results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We present Neighbor Counting Ensemble, a gene function prediction method which integrates eleven gene co-function networks for Arabidopsis thaliana, and produces more accurate gene function predictions for a larger fraction of genes with unknown function. We used these predictions to identify genes involved in mitochondrial complex I formation, and for five of them we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet, available at http://www.gene2function.de/ensemblenet.html.

Download Full-text