scholarly journals Accurate detection of short and long active ORFs using Ribo-seq data

2019 ◽  
Vol 36 (7) ◽  
pp. 2053-2059 ◽  
Author(s):  
Saket Choudhary ◽  
Wenzheng Li ◽  
Andrew D. Smith

Abstract Motivation Ribo-seq, a technique for deep-sequencing ribosome-protected mRNA fragments, has enabled transcriptome-wide monitoring of translation in vivo. It has opened avenues for re-evaluating the coding potential of open reading frames (ORFs), including many short ORFs that were previously presumed to be non-translating. However, the detection of translating ORFs, specifically short ORFs, from Ribo-seq data, remains challenging due to its high heterogeneity and noise. Results We present ribotricer, a method for detecting actively translating ORFs by directly leveraging the three-nucleotide periodicity of Ribo-seq data. Ribotricer demonstrates higher accuracy and robustness compared with other methods at detecting actively translating ORFs including short ORFs on multiple published datasets across species inclusive of Arabidopsis, Caenorhabditis elegans, Drosophila, human, mouse, rat, yeast and zebrafish. Availability and implementation Ribotricer is available at https://github.com/smithlabcode/ribotricer. All analysis scripts and results are available at https://github.com/smithlabcode/ribotricer-results. Supplementary information Supplementary data are available at Bioinformatics online.

2006 ◽  
Vol 80 (8) ◽  
pp. 4179-4182 ◽  
Author(s):  
Pierre Rivailler ◽  
Amitinder Kaur ◽  
R. Paul Johnson ◽  
Fred Wang

ABSTRACT A pathogenic isolate of rhesus cytomegalovirus (rhCMV 180.92) was cloned, sequenced, and annotated. Comparisons with the published rhCMV 68.1 genome revealed 8 open reading frames (ORFs) in isolate 180.92 that are absent in 68.1, 10 ORFs in 68.1 that are absent in 180.92, and 34 additional ORFs that were not previously annotated. Most of the differences appear to be due to genetic rearrangements in both isolates from a region that is frequently altered in human CMV (hCMV) during in vitro passage. These results indicate that the rhCMV ORF repertoire is larger than previously recognized. Like hCMV, understanding of the complete coding capacity of rhCMV is complicated by genomic instability and may require comparisons with additional isolates in vitro and in vivo.


1987 ◽  
Vol 7 (12) ◽  
pp. 4266-4272 ◽  
Author(s):  
L W Stanton ◽  
J M Bishop

NMYC is a gene whose amplification and overexpression have been implicated in the generation of certain human malignancies. Little is known of how the expression of NMYC is normally controlled. We have therefore characterized transcription from the gene and the structure and stability of the resulting mRNAs. Transcription from NMYC is exceptionally complex: it initiates at numerous sites that may be grouped under the control of two promoters, and the multiplicity of initiation sites combines with alternative splicing to engender two forms of mRNA. The mRNAs have different 5' leader sequences (alternative first exons of the gene) but identical bodies (the second and third exons of the gene). Both forms of mRNA are unstable, with half-lives of ca. 15 min. Both encode the previously identified 65,000 and 67,000-dalton products of NMYC. However, the alternative first exons contain distinctive open reading frames that may diversify the coding potential of NMYC. The complexities in transcription of NMYC expand the means by which expression of the gene might be controlled.


2008 ◽  
Vol 190 (18) ◽  
pp. 6111-6118 ◽  
Author(s):  
P. Rousseau ◽  
C. Loot ◽  
C. Turlan ◽  
S. Nolivos ◽  
M. Chandler

ABSTRACT IS911 is a bacterial insertion sequence composed of two consecutive overlapping open reading frames (ORFs [orfA and orfB]) encoding the transposase (OrfAB) as well as a regulatory protein (OrfA). These ORFs are bordered by terminal left and right inverted repeats (IRL and IRR, respectively) with several differences in nucleotide sequence. IS911 transposition is asymmetric: each end is cleaved on one strand to generate a free 3′-OH, which is then used as the nucleophile in attacking the opposite insertion sequence (IS) end to generate a free IS circle. This will be inserted into a new target site. We show here that the ends exhibit functional differences which, in vivo, may favor the use of one compared to the other during transposition. Electromobility shift assays showed that a truncated form of the transposase [OrfAB(1-149)] exhibits higher affinity for IRR than for IRL. While there was no detectable difference in IR activities during the early steps of transposition, IRR was more efficient during the final insertion steps. We show here that the differential activities between the two IRs correlate with the different affinities of OrfAB(1-149) for the IRs during assembly of the nucleoprotein complexes leading to transposition. We conclude that the two inverted repeats are not equivalent during IS911 transposition and that this asymmetry may intervene to determine the ordered assembly of the different protein-DNA complexes involved in the reaction.


2021 ◽  
Author(s):  
Yanyi Jiang ◽  
Xiaofan Chen ◽  
Wei Zhang

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.


1988 ◽  
Vol 8 (12) ◽  
pp. 5439-5447
Author(s):  
P P Mueller ◽  
B M Jackson ◽  
P F Miller ◽  
A G Hinnebusch

The third and fourth AUG codons in GCN4 mRNA efficiently repress translation of the GCN4-coding sequences under normal growth conditions. The first AUG codon is approximately 30-fold less inhibitory and is required under amino acid starvation conditions to override the repressing effects of AUG codons 3 and 4. lacZ fusions constructed to functional, elongated versions of the first and fourth upstream open reading frames (URFs) were used to show that AUG codons 1 and 4 function similarly as efficient translational start sites in vivo, raising the possibility that steps following initiation distinguish the regulatory properties of URFs 1 and 4. In accord with this idea, we observed different consequences of changing the length and termination site of URF1 versus changing those of URFs 3 and 4. The latter were lengthened considerably, with little or no effect on regulation. In fact, the function of URFs 3 and 4 was partially reconstituted with a completely heterologous URF. By contrast, certain mutations that lengthen URF1 impaired its positive regulatory function nearly as much as removing its AUG codon did. The same mutations also made URF1 a much more inhibitory element when it was present alone in the mRNA leader. These results strongly suggest that URFs 1 and 4 both function in regulation as translated coding sequences. To account for the phenotypes of the URF1 mutations, we suggest the most ribosomes normally translate URF1 and that the mutations reduce the number of ribosomes that are able to complete URF1 translation and resume scanning downstream. This effect would impair URF1 positive regulatory function if ribosomes must first translate URF1 in order to overcome the strong translational block at the 3'-proximal URFs. Because URF1-lacZ fusions were translated at the same rate under repressing and derepressing conditions, it appears that modulating initiation at URF1 is not the means that is used to restrict the regulatory consequences of URF1 translation to starvation conditions.


Blood ◽  
1999 ◽  
Vol 93 (9) ◽  
pp. 2936-2944 ◽  
Author(s):  
Ramachandran Ramalingam ◽  
Shahin Rafii ◽  
Stefan Worgall ◽  
Douglas E. Brough ◽  
Ronald G. Crystal

Abstract Although endothelial cells are quiescent and long-lived in vivo, when they are removed from blood vessels and cultured in vitro they die within days to weeks. In studies of the interaction of E1−E4+ replication–deficient adenovirus (Ad) vectors and human endothelium, the cells remained quiescent and were viable for prolonged periods. Evaluation of these cultures showed that E1−E4+ Ad vectors provide an “antiapoptotic” signal that, in association with an increase in the ratio of Bcl2 to Bax levels, induces the endothelial cells to enter a state of “suspended animation,” remaining viable for at least 30 days, even in the absence of serum and growth factors. Although the mechanisms initiating these events are unclear, the antiapoptoic signal requires the presence of E4 genes in the vector genome, suggesting that one or more E4 open reading frames of subgroup C Ad initiate a “pro-life” program that modifies cultured endothelial cells to survive for prolonged periods.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


2003 ◽  
Vol 77 (20) ◽  
pp. 11268-11273 ◽  
Author(s):  
Nikolai Klymiuk ◽  
Mathias Müller ◽  
Gottfried Brem ◽  
Bernhard Aigner

ABSTRACT Endogenous retrovirus (ERV) sequences have been found in all mammals. In vitro and in vivo experiments revealed ERV activation and cross-species infection in several species. Sheep (Ovis aries) are used for various biotechnological purposes; however, they have not yet been comprehensively screened for ERV sequences. Therefore, the aim of the study was to classify the ERV sequences in the ovine genome (OERV) by analyzing the retroviral pro-pol sequences. Three OERV β families and nine OERV γ families were revealed. Novel open reading frames (ORF) in the amplified proviral fragment were found in one OERV β family and two OERV γ families. Hybrid OERV produced by putative recombination events were not detected. Quantitative analysis of the OERV sequences in the ovine genome revealed no relevant variations in the endogenous retroviral loads of different breeds. Expression analysis of different tissues from fetal and pregnant sheep detected mRNA from both gammaretrovirus families, showing ORF fragments. Thus, the release of retroviruses from sheep cells cannot be excluded.


2020 ◽  
Vol 36 (12) ◽  
pp. 3645-3651
Author(s):  
Lyam Baudry ◽  
Gaël A Millot ◽  
Agnes Thierry ◽  
Romain Koszul ◽  
Vittore F Scolari

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document