Gene Prediction, Ab Initio (Intrinsic Gene Prediction, Template Gene Prediction)

Author(s):  
Roderic Guig��
2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Tyler Alioto ◽  
Ernesto Picardi ◽  
Roderic Guigó ◽  
Graziano Pesole

New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurateab initiogene prediction methods. However, it is apparent that fullyab initiomethods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entireC. elegansgenome and the 44 ENCODE human pilot regions.


2019 ◽  
Vol 20 (S15) ◽  
Author(s):  
Prapaporn Techa-Angkoon ◽  
Kevin L. Childs ◽  
Yanni Sun

Abstract Background Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Nicolas Scalzitti ◽  
Anne Jeannin-Girardon ◽  
Pierre Collet ◽  
Olivier Poch ◽  
Julie D. Thompson

2005 ◽  
Vol 57 (3) ◽  
pp. 445-460 ◽  
Author(s):  
Hong Yao ◽  
Ling Guo ◽  
Yan Fu ◽  
Lisa A. Borsuk ◽  
Tsui-Jung Wen ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Richard Finkers ◽  
Martijn P.W. van Kaauwen ◽  
Kai Ament ◽  
Karin Burger-Meijer ◽  
Raymond J. Egging ◽  
...  

Onion is an important vegetable crop with an estimated genome size of 16GB. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with a N50 of 461 Kb. Of which 2.2 Gb was ordered into 8 pseudomolecules using five genetic linkage maps. The remainder of the genome is available in 89.8 K scaffolds. Analysis of this genome shows that at least 72.4% of the genome is repetitive and consists, to a large extent, of (retro) transposons. Many (retro) transposons were already quite old as they had accumulated many mutations, facilitating their assembly, however, hampering their identification. The draft ab initio gene prediction indicated 540 925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. 86,073 models showed similarity to published proteins (UNIPROT). No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with A. sativum (garlic) showed collinearity but also major rearrangements between both species. Not-withstanding, this assembly is the first high-quality draft genome sequence available for the study of onion and will be a valuable resource for further research.


Author(s):  
Enrique Blanco ◽  
Josep F. Abril ◽  
Roderic Guigó
Keyword(s):  

2019 ◽  
Author(s):  
Nicolas Scalzitti ◽  
Anne Jeannin-Girardon ◽  
Pierre Collet ◽  
Olivier Poch ◽  
Julie Dawn Thompson

Abstract Background: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.


2008 ◽  
Vol 18 (12) ◽  
pp. 1979-1990 ◽  
Author(s):  
V. Ter-Hovhannisyan ◽  
A. Lomsadze ◽  
Y. O. Chernoff ◽  
M. Borodovsky

Author(s):  
Akhilesh Mishra ◽  
Priyanka Siwach ◽  
Poonam Singhal ◽  
B. Jayaram
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document