Identification of Protein Coding Regions In Genomic DNA

1995 ◽  
Vol 248 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Eric E. Snyder ◽  
Gary D. Stormo
2005 ◽  
Vol 2 (1) ◽  
pp. 38-47
Author(s):  
Said S. Adi ◽  
Carlos E. Ferreira

Summary Given the increasing number of available genomic sequences, one now faces the task of identifying their functional parts, like the protein coding regions. The gene prediction problem can be addressed in several ways. One of the most promising methods makes use of similarity information between the genomic DNA and previously annotated sequences (proteins, cDNAs and ESTs). Recently, given the huge amount of newly sequenced genomes, new similarity-based methods are being successfully applied in the task of gene prediction. The so-called comparative-based methods lie in the similarities shared by regions of two evolutionary related genomic sequences. Despite the number of different gene prediction approaches in the literature, this problem remains challenging. In this paper we present a new comparative-based approach to the gene prediction problem. It is based on a syntenic alignment of three or more genomic sequences. With syntenic alignment we mean an alignment that is constructed taking into account the fact that the involved sequences include conserved regions intervened by unconserved ones. We have implemented the proposed algorithm in a computer program and confirm the validity of the approach on a benchmark including triples of human, mouse and rat genomic sequences.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


Genetics ◽  
1997 ◽  
Vol 147 (3) ◽  
pp. 1213-1224
Author(s):  
Jean-Philippe Charles ◽  
Carol Chihara ◽  
Shamim Nejad ◽  
Lynn M Riddiford

A 36-kb genomic DNA segment of the Drosophila melanogaster genome containing 12 clustered cuticle genes has been mapped and partially sequenced. The cluster maps at 65A 5-6 on the left arm of the third chromosome, in agreement with the previously determined location of a putative cluster encompassing the genes for the third instar larval cuticle proteins LCP5, LCP6 and LCP8. This cluster is the largest cuticle gene cluster discovered to date and shows a number of surprising features that explain in part the genetic complexity of the LCP5, LCP6 and LCP8 loci. The genes encoding LCP5 and LCP8 are multiple copy genes and the presence of extensive similarity in their coding regions gives the first evidence for gene conversion in cuticle genes. In addition, five genes in the cluster are intronless. Four of these five have arisen by retroposition. The other genes in the cluster have a single intron located at an unusual location for insect cuticle genes.


Biochimie ◽  
2011 ◽  
Vol 93 (11) ◽  
pp. 2019-2023 ◽  
Author(s):  
Sven Findeiß ◽  
Jan Engelhardt ◽  
Sonja J. Prohaska ◽  
Peter F. Stadler

1991 ◽  
Vol 11 (3) ◽  
pp. 1770-1776
Author(s):  
R G Collum ◽  
D F Clayton ◽  
F W Alt

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons.


2010 ◽  
Vol 11 (3) ◽  
pp. 243
Author(s):  
Saber Jelokhani-Niaraki ◽  
Majid Esmaelizad ◽  
Morteza Daliri ◽  
Rasoul Vaez-Torshizi ◽  
Morteza Kamalzadeh ◽  
...  

2016 ◽  
Vol 4 (6) ◽  
Author(s):  
Xuehua Wan ◽  
Shaobin Hou ◽  
Kazukuni Hayashi ◽  
James Anderson ◽  
Stuart P. Donachie

Rheinheimera salexigens KH87 T is an obligately halophilic gammaproteobacterium. The strain’s draft genome sequence, generated by the Roche 454 GS FLX+ platform, comprises two scaffolds of ~3.4 Mbp and ~3 kbp, with 3,030 protein-coding sequences and 58 tRNA coding regions. The G+C content is 42 mol%.


Sign in / Sign up

Export Citation Format

Share Document