Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions

Summary Given the increasing number of available genomic sequences, one now faces the task of identifying their functional parts, like the protein coding regions. The gene prediction problem can be addressed in several ways. One of the most promising methods makes use of similarity information between the genomic DNA and previously annotated sequences (proteins, cDNAs and ESTs). Recently, given the huge amount of newly sequenced genomes, new similarity-based methods are being successfully applied in the task of gene prediction. The so-called comparative-based methods lie in the similarities shared by regions of two evolutionary related genomic sequences. Despite the number of different gene prediction approaches in the literature, this problem remains challenging. In this paper we present a new comparative-based approach to the gene prediction problem. It is based on a syntenic alignment of three or more genomic sequences. With syntenic alignment we mean an alignment that is constructed taking into account the fact that the involved sequences include conserved regions intervened by unconserved ones. We have implemented the proposed algorithm in a computer program and confirm the validity of the approach on a benchmark including triples of human, mouse and rat genomic sequences.

Download Full-text

Identification of Protein Coding Regions of Rice Genes Using Alternative Spectral Rotation Measure and Linear Discriminant Analysis

Genomics Proteomics & Bioinformatics ◽

10.1016/s1672-0229(04)02022-4 ◽

2004 ◽

Vol 2 (3) ◽

pp. 167-173 ◽

Cited By ~ 3

Author(s):

Jiao Jin

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Protein Coding ◽

Linear Discriminant ◽

Coding Regions ◽

Rotation Measure

Download Full-text

A Fast DFT based Gene Prediction Algorithm for Identification of Protein Coding Regions

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. ◽

10.1109/icassp.2005.1416388 ◽

2006 ◽

Cited By ~ 22

Author(s):

S. Datta ◽

A. Asif

Keyword(s):

Gene Prediction ◽

Prediction Algorithm ◽

Protein Coding ◽

Coding Regions

Download Full-text

WISCOD: A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions

BioMed Research International ◽

10.1155/2014/282343 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10

Author(s):

Mireia Vilardell ◽

Genis Parra ◽

Sergi Civit

Keyword(s):

Gene Prediction ◽

Software Tool ◽

Local Mode ◽

Protein Coding ◽

Computational Costs ◽

Additional Information ◽

Coding Regions ◽

Specificity And Sensitivity ◽

Exon Prediction ◽

Eukaryotic Genomes

Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use globalPvalue called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms.

Download Full-text

Gene Prediction by the Scale-limited Gabor Wavelet Transform for Identifying the Protein Coding Regions

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305490 ◽

2020 ◽

Author(s):

Qian Zheng ◽

Wenxiang Zhou ◽

Tao Chen ◽

Lei Xie ◽

Hong Ye Su

Keyword(s):

Wavelet Transform ◽

Gene Prediction ◽

Gabor Wavelet ◽

Protein Coding ◽

Gabor Wavelet Transform ◽

Coding Regions

Download Full-text

Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric

Current Topics in Medicinal Chemistry ◽

10.2174/1568026613666131204110022 ◽

2014 ◽

Vol 14 (3) ◽

pp. 407-417

Author(s):

Robersy Sanchez

Keyword(s):

Genetic Code ◽

Evolutionary Analysis ◽

Protein Coding ◽

Coding Regions

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Novel exon 1 protein‐coding regions N‐terminally extend human KCNE3 and KCNE4

The FASEB Journal ◽

10.1096/fj.201600467r ◽

2016 ◽

Vol 30 (8) ◽

pp. 2959-2969 ◽

Cited By ~ 8

Author(s):

Geoffrey W. Abbott

Keyword(s):

Protein Coding ◽

Coding Regions ◽

Exon 1 ◽

Novel Exon

Download Full-text

Protein-coding structured RNAs: A computational survey of conserved RNA secondary structures overlapping coding regions in drosophilids

Biochimie ◽

10.1016/j.biochi.2011.07.023 ◽

2011 ◽

Vol 93 (11) ◽

pp. 2019-2023 ◽

Cited By ~ 8

Author(s):

Sven Findeiß ◽

Jan Engelhardt ◽

Sonja J. Prohaska ◽

Peter F. Stadler

Keyword(s):

Secondary Structures ◽

Protein Coding ◽

Rna Secondary Structures ◽

Coding Regions

Download Full-text

Structure and expression of canary myc family genes

Molecular and Cellular Biology ◽

10.1128/mcb.11.3.1770-1776.1991 ◽

1991 ◽

Vol 11 (3) ◽

pp. 1770-1776

Author(s):

R G Collum ◽

D F Clayton ◽

F W Alt

Keyword(s):

Untranslated Region ◽

Untranslated Regions ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Neuronal Precursors ◽

Myc Gene ◽

Mature Neurons

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons.

Download Full-text