Gene prediction by multiple syntenic alignment

Summary Given the increasing number of available genomic sequences, one now faces the task of identifying their functional parts, like the protein coding regions. The gene prediction problem can be addressed in several ways. One of the most promising methods makes use of similarity information between the genomic DNA and previously annotated sequences (proteins, cDNAs and ESTs). Recently, given the huge amount of newly sequenced genomes, new similarity-based methods are being successfully applied in the task of gene prediction. The so-called comparative-based methods lie in the similarities shared by regions of two evolutionary related genomic sequences. Despite the number of different gene prediction approaches in the literature, this problem remains challenging. In this paper we present a new comparative-based approach to the gene prediction problem. It is based on a syntenic alignment of three or more genomic sequences. With syntenic alignment we mean an alignment that is constructed taking into account the fact that the involved sequences include conserved regions intervened by unconserved ones. We have implemented the proposed algorithm in a computer program and confirm the validity of the approach on a benchmark including triples of human, mouse and rat genomic sequences.

Download Full-text

[10] Finding protein coding regions in genomic sequences

Methods in Enzymology ◽

10.1016/0076-6879(90)83012-x ◽

1990 ◽

pp. 163-180 ◽

Cited By ~ 43

Author(s):

Rodger Staden

Keyword(s):

Genomic Sequences ◽

Protein Coding ◽

Coding Regions

Download Full-text

Fast Algorithmfor Identifying Protein-Coding Regions

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.647.471 ◽

2013 ◽

Vol 647 ◽

pp. 471-475

Author(s):

Ling Ke Wang ◽

Ji Xian Meng ◽

Hai Peng Zhu ◽

Xin Zhong Lu

Keyword(s):

Fourier Transform ◽

Discrete Fourier Transform ◽

Research Area ◽

Genomic Sequences ◽

Protein Coding ◽

Signal Noise ◽

Coding Regions ◽

Noise Ratio

Identification of protein-coding regions is a hot research area at present. After using mapping methods to turn symbolic genomic sequences into numeric sequences, we need to do the transform to show the period-3 component of protein-coding regions. In this paper, we find twofast algorithms relying on discrete Fourier transform and the extension of Signal Noise Ratio to compute the period-3 component of protein-coding regions.

Download Full-text

A Fast DFT based Gene Prediction Algorithm for Identification of Protein Coding Regions

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. ◽

10.1109/icassp.2005.1416388 ◽

2006 ◽

Cited By ~ 22

Author(s):

S. Datta ◽

A. Asif

Keyword(s):

Gene Prediction ◽

Prediction Algorithm ◽

Protein Coding ◽

Coding Regions

Download Full-text

Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions

Genome Research ◽

10.1101/gr.1261703 ◽

2003 ◽

Cited By ~ 14

Author(s):

D. Kotlar

Keyword(s):

Gene Prediction ◽

New Method ◽

Protein Coding ◽

Coding Regions ◽

Rotation Measure

Download Full-text

Identification of Protein Coding Regions In Genomic DNA

Journal of Molecular Biology ◽

10.1006/jmbi.1995.0198 ◽

1995 ◽

Vol 248 (1) ◽

pp. 1-18 ◽

Cited By ~ 120

Author(s):

Eric E. Snyder ◽

Gary D. Stormo

Keyword(s):

Genomic Dna ◽

Protein Coding ◽

Coding Regions

Download Full-text

WISCOD: A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions

BioMed Research International ◽

10.1155/2014/282343 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10

Author(s):

Mireia Vilardell ◽

Genis Parra ◽

Sergi Civit

Keyword(s):

Gene Prediction ◽

Software Tool ◽

Local Mode ◽

Protein Coding ◽

Computational Costs ◽

Additional Information ◽

Coding Regions ◽

Specificity And Sensitivity ◽

Exon Prediction ◽

Eukaryotic Genomes

Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use globalPvalue called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms.

Download Full-text

Neural networks for identification of protein coding regions in genomic DNA sequences

Handbook of Neural Computation ◽

10.1887/0750303123/b365c114 ◽

2004 ◽

Cited By ~ 1

Author(s):

E E Snyder ◽

Gary D Stormo

Keyword(s):

Neural Networks ◽

Dna Sequences ◽

Genomic Dna ◽

Protein Coding ◽

Coding Regions

Download Full-text

Gene Prediction by the Scale-limited Gabor Wavelet Transform for Identifying the Protein Coding Regions

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305490 ◽

2020 ◽

Author(s):

Qian Zheng ◽

Wenxiang Zhou ◽

Tao Chen ◽

Lei Xie ◽

Hong Ye Su

Keyword(s):

Wavelet Transform ◽

Gene Prediction ◽

Gabor Wavelet ◽

Protein Coding ◽

Gabor Wavelet Transform ◽

Coding Regions

Download Full-text

Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric

Current Topics in Medicinal Chemistry ◽

10.2174/1568026613666131204110022 ◽

2014 ◽

Vol 14 (3) ◽

pp. 407-417

Author(s):

Robersy Sanchez

Keyword(s):

Genetic Code ◽

Evolutionary Analysis ◽

Protein Coding ◽

Coding Regions

Download Full-text

Analysis of HLA-G long-read genomic sequences in mother–offspring pairs with preeclampsia

Scientific Reports ◽

10.1038/s41598-020-77081-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ayako Nishizawa ◽

Kazuki Kumada ◽

Keiko Tateno ◽

Maiko Wagata ◽

Sakae Saito ◽

...

Keyword(s):

Single Molecule ◽

Gene Polymorphisms ◽

Genomic Dna ◽

Genomic Sequences ◽

Genomic Sequencing ◽

Public Database ◽

Coding Sequences ◽

Pacbio Rs Ii ◽

Potential Association ◽

Long Read

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.

Download Full-text