A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes

BIOPHYSICS ◽  
2019 ◽  
Vol 64 (3) ◽  
pp. 339-348
Author(s):  
Yu. M. Suvorova ◽  
V. M. Pugacheva ◽  
E. V. Korotkov
2015 ◽  
Author(s):  
Malgorzata Habich ◽  
Sergej Djuranovic ◽  
Pawel Szczesny

Recent addition to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA) tracks encoding for poly-lysine runs in protein sequences. Such tracks stall translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such they substantially influence the stability of mRNA and amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance, and makes each of these sequences potentially interesting case studies for effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present the PATACSDB, a resource that contain comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A's allowing for one mismatched base. The PATACSDB database is accesible at: http://sysbio.ibb.waw.pl/patacsdb. Source code is available for download from GitHub repository at http://github.com/habich/PATACSDB, including the scripts to recreate the database from the scratch on user's own computer.


1999 ◽  
Vol 10 (04) ◽  
pp. 635-643 ◽  
Author(s):  
AGNIESZKA GIERLIK ◽  
PAWEŁ MACKIEWICZ ◽  
MARIA KOWALCZUK ◽  
STANISŁAW CEBRAT ◽  
MIROSŁAW R. DUDEK

Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially in the antisense strand. This is a specific feature of the genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer ones than those spontaneously generated in random DNA sequences, should be considered as two different sets of ORFs: The first one coding for proteins, the second one generated by the coding ORFs. Even intergenic sequences possess greater capacity for generating ORFs than random DNA sequences of the same nucleotide composition, which seems to be a premise that intergenic sequences were generated from coding sequences by recombinational mechanisms.


2011 ◽  
Vol 11 ◽  
pp. 842-854 ◽  
Author(s):  
Larisa Fedorova ◽  
Alexei Fedorov

Multicellular eukaryotic genomes are replete with nonprotein coding sequences, both within genes (introns) and between them (intergenic regions). Excluding the well-recognized functional elements within these sequences (ncRNAs, transcription factor binding sites, intronic enhancers/silencers, etc.), the remaining portion is made up of so-called “dark” DNA, which still occupies the majority of the genome. This dark DNA has a profound nonrandomness in its sequence composition seen at different scales, from a few nucleotides to regions that span over hundreds of thousands of nucleotides. At the mid-range scale (from 30 up to 10,000 nt), this nonrandomness is manifested in base compositional extremes detected for each of four nucleotides (A, G, T, or C) or any of their combinations. Examples of such compositional nonrandomness are A-rich, purine-rich, or G+T-rich regions. Almost every combination of nucleotides has such enriched regions. We refer to these regions as being “inhomogeneous”. These regions are associated with unusual DNA conformations and/or particular DNA properties. In particular, mid-range inhomogeneous regions have complex arrangements relative to each other and to specific genomic sites, such as centromeres, telomeres, and promoters, pointing to their important role in genomic functioning and organization.


2004 ◽  
Vol 2 (1) ◽  
pp. 24-31 ◽  
Author(s):  
Bin Li ◽  
Qingyou Xia ◽  
Cheng Lu ◽  
Zeyang Zhou ◽  
Zhonghuai Xiang

2015 ◽  
Author(s):  
Malgorzata Habich ◽  
Sergej Djuranovic ◽  
Pawel Szczesny

Recent addition to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA) tracks encoding for poly-lysine runs in protein sequences. Such tracks stall translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such they substantially influence the stability of mRNA and amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance, and makes each of these sequences potentially interesting case studies for effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present the PATACSDB, a resource that contain comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A's allowing for one mismatched base. The PATACSDB database is accesible at: http://sysbio.ibb.waw.pl/patacsdb. Source code is available for download from GitHub repository at http://github.com/habich/PATACSDB, including the scripts to recreate the database from the scratch on user's own computer.


1986 ◽  
Vol 6 (1) ◽  
pp. 168-182
Author(s):  
D D Loeb ◽  
R W Padgett ◽  
S C Hardies ◽  
W R Shehee ◽  
M B Comer ◽  
...  

The complete nucleotide sequence of a 6,851-base pair (bp) member of the L1Md repetitive family from a selected random isolate of the BALB/c mouse genome is reported here. Five kilobases of the element contains two overlapping reading frames of 1,137 and 3,900 bp. The entire 3,900-bp frame and the 3' 600 bp of the 1,137-bp frame, when compared with a composite consensus primate L1 sequence, show a ratio of replacement to silent site differences characteristic of protein coding sequences. This more closely defines the protein coding capacity of this repetitive family, which was previously shown to possess a large open reading frame of undetermined extent. The relative organization of the 1,137- and 3,900-bp reading frames, which overlap by 14 bp, bears resemblance to protein-coding, mobile genetic elements. Homology can be found between the amino acid sequence of the 3,900-bp frame and selected domains of several reverse transcriptases. The 5' ends of the two L1Md elements described in this report have multiple copies, 4 2/3 copies and 1 2/3 copy, of a 208-bp direct tandem repeat. The sequence of this 208-bp element differs from the sequence of a previously defined 5' end for an L1Md element, indicating that there are at least two different 5' end motifs for L1Md.


2000 ◽  
Vol 74 (14) ◽  
pp. 6581-6591 ◽  
Author(s):  
Pamela J. Glass ◽  
Laura J. White ◽  
Judith M. Ball ◽  
Isabelle Leparc-Goffart ◽  
Michele E. Hardy ◽  
...  

ABSTRACT Norwalk virus (NV) is a causative agent of acute epidemic nonbacterial gastroenteritis in humans. The inability to cultivate NV has required the use of molecular techniques to examine the genome organization and functions of the viral proteins. The function of the NV protein encoded by open reading frame 3 (ORF 3) has been unknown. In this paper, we report the characterization of the NV ORF 3 protein expressed in a cell-free translation system and in insect cells and show its association with recombinant virus-like particles (VLPs) and NV virions. Expression of the ORF 3 coding region in rabbit reticulocyte lysates resulted in the production of a single protein with an apparent molecular weight of 23,000 (23K protein), which is not modified by N-linked glycosylation. The ORF 3 protein was expressed in insect cells by using two different baculovirus recombinants; one recombinant contained the entire 3′ end of the genome beginning with the ORF 2 coding sequences (ORFs 2+3), and the second recombinant contained ORF 3 alone. Expression from the construct containing both ORF 2 and ORF 3 resulted in the expression of a single protein (23K protein) detected by Western blot analysis with ORF 3-specific peptide antisera. However, expression from a construct containing only the ORF 3 coding sequences resulted in the production of multiple forms of the ORF 3 protein ranging in size from 23,000 to 35,000. Indirect-immunofluorescence studies using an ORF 3 peptide antiserum showed that the ORF 3 protein is localized to the cytoplasm of infected insect cells. The 23K ORF 3 protein was consistently associated with recombinant VLPs purified from the media of insect cells infected with a baculovirus recombinant containing the entire 3′ end of the NV genome. Western blot analysis of NV purified from the stools of NV-infected volunteers revealed the presence of a 35K protein as well as multiple higher-molecular-weight bands specifically recognized by an ORF 3 peptide antiserum. These results indicate that the ORF 3 protein is a minor structural protein of the virion.


Sign in / Sign up

Export Citation Format

Share Document