scholarly journals YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 155 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Monica Britton ◽  
Jill Wegrzyn ◽  
Timothy Butterfield ◽  
Pedro José Martínez-García ◽  
...  

The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.

F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 155 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Monica Britton ◽  
Jill Wegrzyn ◽  
Timothy Butterfield ◽  
Basuthkar J. Rao ◽  
...  

The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.


1988 ◽  
Vol 8 (9) ◽  
pp. 3827-3836
Author(s):  
N P Williams ◽  
P P Mueller ◽  
A G Hinnebusch

Translational control of GCN4 expression in the yeast Saccharomyces cerevisiae is mediated by multiple AUG codons present in the leader of GCN4 mRNA, each of which initiates a short open reading frame of only two or three codons. Upstream AUG codons 3 and 4 are required to repress GCN4 expression in normal growth conditions; AUG codons 1 and 2 are needed to overcome this repression in amino acid starvation conditions. We show that the regulatory function of AUG codons 1 and 2 can be qualitatively mimicked by the AUG codons of two heterologous upstream open reading frames (URFs) containing the initiation regions of the yeast genes PGK and TRP1. These AUG codons inhibit GCN4 expression when present singly in the mRNA leader; however, they stimulate GCN4 expression in derepressing conditions when inserted upstream from AUG codons 3 and 4. This finding supports the idea that AUG codons 1 and 2 function in the control mechanism as translation initiation sites and further suggests that suppression of the inhibitory effects of AUG codons 3 and 4 is a general consequence of the translation of URF 1 and 2 sequences upstream. Several observations suggest that AUG codons 3 and 4 are efficient initiation sites; however, these sequences do not act as positive regulatory elements when placed upstream from URF 1. This result suggests that efficient translation is only one of the important properties of the 5' proximal URFs in GCN4 mRNA. We propose that a second property is the ability to permit reinitiation following termination of translation and that URF 1 is optimized for this regulatory function.


2009 ◽  
Vol 90 (6) ◽  
pp. 1505-1514 ◽  
Author(s):  
Asieh Rasoolizadeh ◽  
Catherine Béliveau ◽  
Don Stewart ◽  
Conrad Cloutier ◽  
Michel Cusson

The endoparasitic wasp Tranosema rostrale transmits an ichnovirus to its lepidopteran host, Choristoneura fumiferana, during parasitization. As shown for other ichnoviruses, the segmented dsDNA genome of the T. rostrale ichnovirus (TrIV) features several multi-gene families, including the repeat element (rep) family, whose products display no known similarity to non-ichnovirus proteins, except for a homologue encoded by the genome of the Helicoverpa armigera granulovirus; their functions remain unknown. This study applied linear regression of efficiency analysis to real-time PCR quantification of transcript abundance for all 17 TrIV rep open reading frames (ORFs) in parasitized and virus-injected C. fumiferana larvae, as well as in T. rostrale ovaries and head–thorax complexes. Although transcripts were detected for most rep ORFs in infected caterpillars, two of them clearly outnumbered the others in whole larvae, with a tendency for levels to drop over time after infection. The genome segments bearing the three most highly expressed rep genes in parasitized caterpillars were present in higher proportions than other rep-bearing genome segments in TrIV DNA, suggesting a possible role for gene dosage in the regulation of transcription level. TrIV rep genes also showed important differences in the relative abundance of their transcripts in specific tissues (cuticular epithelium, the fat body, haemocytes and the midgut), implying tissue-specific roles for individual members of this gene family. Significantly, no rep transcripts were detected in T. rostrale head–thorax complexes, whereas some were abundant in ovaries. There, the transcription pattern was completely different from that observed in infected caterpillars, suggesting that some rep genes have wasp-specific functions.


1999 ◽  
Vol 10 (04) ◽  
pp. 635-643 ◽  
Author(s):  
AGNIESZKA GIERLIK ◽  
PAWEŁ MACKIEWICZ ◽  
MARIA KOWALCZUK ◽  
STANISŁAW CEBRAT ◽  
MIROSŁAW R. DUDEK

Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially in the antisense strand. This is a specific feature of the genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer ones than those spontaneously generated in random DNA sequences, should be considered as two different sets of ORFs: The first one coding for proteins, the second one generated by the coding ORFs. Even intergenic sequences possess greater capacity for generating ORFs than random DNA sequences of the same nucleotide composition, which seems to be a premise that intergenic sequences were generated from coding sequences by recombinational mechanisms.


2004 ◽  
Vol 78 (21) ◽  
pp. 11544-11550 ◽  
Author(s):  
Paul Kraft ◽  
Andrea Oeckinghaus ◽  
Daniel Kümmel ◽  
George H. Gauss ◽  
John Gilmore ◽  
...  

ABSTRACT Sulfolobus spindle-shaped viruses (SSVs), or Fuselloviridae, are ubiquitous crenarchaeal viruses found in high-temperature acidic hot springs around the world (pH ≤4.0; temperature of ≥70°C). Because they are relatively easy to isolate, they represent the best studied of the crenarchaeal viruses. This is particularly true for the type virus, SSV1, which contains a double-stranded DNA genome of 15.5 kilobases, encoding 34 putative open reading frames. Interestingly, the genome shows little sequence similarity to organisms other than its SSV homologues. Together, sequence similarity and biochemical analyses have suggested functions for only 6 of the 34 open reading frames. Thus, even though SSV1 is the best-studied crenarchaeal virus, functions for most (28) of its open reading frames remain unknown. We have undertaken biochemical and structural studies for the gene product of open reading frame F-93. We find that F-93 exists as a homodimer in solution and that a tight dimer is also present in the 2.7-Å crystal structure. Further, the crystal structure reveals a fold that is homologous to the SlyA and MarR subfamilies of winged-helix DNA binding proteins. This strongly suggests that F-93 functions as a transcription factor that recognizes a (pseudo-)palindromic DNA target sequence.


2008 ◽  
Vol 82 (17) ◽  
pp. 8917-8921 ◽  
Author(s):  
Christopher J. McCormick ◽  
Omar Salim ◽  
Paul R. Lambden ◽  
Ian N. Clarke

ABSTRACT A generally accepted view of norovirus replication is that capsid expression requires production of a subgenomic transcript, the presence of capsid often being used as a surrogate marker to indicate the occurrence of viral replication. Using a polymerase II-based baculovirus delivery system, we observed capsid expression following introduction of a full-length genogroup 3 norovirus genome into HepG2 cells. However, capsid expression occurred as a result of a novel translation termination/reinitiation event between the nonstructural-protein and capsid open reading frames, a feature that may be unique to genogroup 3 noroviruses.


1990 ◽  
Vol 10 (1) ◽  
pp. 28-36 ◽  
Author(s):  
C I Brannan ◽  
E C Dees ◽  
R S Ingram ◽  
S M Tilghman

The mouse H19 gene was identified as an abundant hepatic fetal-specific mRNA under the transcriptional control of a trans-acting locus termed raf. The protein this gene encoded was not apparent from an analysis of its nucleotide sequence, since the mRNA contained multiple translation termination signals in all three reading frames. As a means of assessing which of the 35 small open reading frames might be important to the function of the gene, the human H19 gene was cloned and sequenced. Comparison of the two homologs revealed no conserved open reading frame. Cellular fractionation showed that H19 RNA is cytoplasmic but not associated with the translational machinery. Instead, it is located in a particle with a sedimentation coefficient of approximately 28S. Despite the fact that it is transcribed by RNA polymerase II and is spliced and polyadenylated, we suggest that the H19 RNA is not a classical mRNA. Instead, the product of this unusual gene may be an RNA molecule.


1998 ◽  
Vol 72 (2) ◽  
pp. 1482-1490 ◽  
Author(s):  
Lin-Fa Wang ◽  
Wojtek P. Michalski ◽  
Meng Yu ◽  
L. Ian Pritchard ◽  
Gary Crameri ◽  
...  

ABSTRACT In 1994, a new member of the family Paramyxoviridaeisolated from fatal cases of respiratory disease in horses and humans was shown to be distantly related to morbilliviruses and provisionally called equine morbillivirus (K. Murray et al., Science 268:94–97, 1995). To facilitate characterization and classification, the virus was purified, viral proteins were identified, and the P/V/C gene was cloned and sequenced. The coding strategy of the gene is similar to that of Sendai and measles viruses, members of the Paramyxovirusand Morbillivirus genera, respectively, in the subfamilyParamyxovirinae. The P/V/C gene contains four open reading frames, three of which, P, C, and V, have Paramyxovirinaecounterparts. The P and C proteins are larger and smaller, respectively, than are cognate proteins in members of the subfamily, and the V protein is made as a result of a single G insertion during transcription. The P/V/C gene has two unique features. (i) A fourth open reading frame is located between those of the C and V proteins and potentially encodes a small basic protein similar to those found in some members of the Rhabdoviridae andFiloviridae families. (ii) There is also a long untranslated 3′ sequence, a feature common in Filoviridaemembers. Sequence comparisons confirm that although the virus is a member of the Paramyxovirinae subfamily, it displays only low levels of homology with paramyxoviruses and morbilliviruses and negligible homologies with rubulaviruses.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.


2018 ◽  
Author(s):  
Anica Scholz ◽  
Florian Eggenhofer ◽  
Rick Gelhausen ◽  
Björn Grüning ◽  
Kathi Zarnack ◽  
...  

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.


Sign in / Sign up

Export Citation Format

Share Document