Annotating high-impact 5′untranslated region variants with the UTRannotator

Bioinformatics ◽

10.1093/bioinformatics/btaa783 ◽

2020 ◽

Author(s):

Xiaolei Zhang ◽

Matthew Wakeling ◽

James Ware ◽

Nicola Whiffin

Keyword(s):

Open Reading Frames ◽

Supplementary Information ◽

Untranslated Regions ◽

Protein Coding ◽

Pathogenic Variants ◽

Uncertain Significance ◽

Upstream Open Reading Frames ◽

The Impact ◽

Reading Frames

Abstract Summary Current tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5′untranslated regions (5′UTR) that create or disrupt upstream open reading frames. We investigate the utility of this tool using the ClinVar database, providing an annotation for 31.9% of all 5′UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTRannotator as we gain new knowledge on the impact of variants in UTRs. Availability and implementation UTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotator. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Annotating high-impact 5’untranslated region variants with the UTRannotator

10.1101/2020.06.03.132266 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xiaolei Zhang ◽

Matthew Wakeling ◽

James Ware ◽

Nicola Whiffin

Keyword(s):

Open Reading Frames ◽

Supplementary Information ◽

Untranslated Regions ◽

Protein Coding ◽

Pathogenic Variants ◽

Uncertain Significance ◽

Upstream Open Reading Frames ◽

The Impact ◽

Reading Frames

AbstractSummaryCurrent tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5’untranslated regions (5’UTR) that create or disrupt upstream open reading frames (uORFs). We investigate the utility of this tool using the ClinVar database, providing an annotation for 30.8% of all 5’UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTR annotator as we gain new knowledge on the impact of variants in UTRs.Availability and implementationUTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotatorSupplementary informationSupplementary data are available at bioRxiv.

Download Full-text

RNA G-quadruplexes mark repressive upstream open reading frames in human mRNAs

10.1101/223073 ◽

2017 ◽

Cited By ~ 1

Author(s):

Pierre Murat ◽

Giovanni Marsico ◽

Barbara Herdy ◽

Avazeh Ghanbarian ◽

Guillem Portella ◽

...

Keyword(s):

Secondary Structures ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Translation Regulation ◽

Physical Interaction ◽

Protein Coding ◽

Upstream Open Reading Frames ◽

Nucleotide Resolution ◽

Reading Frames

ABSTRACTRNA secondary structures in the 5’ untranslated regions (UTRs) of mRNAs have been characterised as key determinants of translation initiation. However the role of non-canonical secondary structures, such as RNA G-quadruplexes (rG4s), in modulating translation of human mRNAs and the associated mechanisms remain largely unappreciated. Here we use a ribosome profiling strategy to investigate the translational landscape of human mRNAs with structured 5’ untranslated regions (5’-UTR). We found that inefficiently translated mRNAs, containing rG4-forming sequences in their 5’-UTRs, have an accumulation of ribosome footprints in their 5’-UTRs. We show that rG4-forming sequences are determinants of 5’-UTR translation, suggesting that the folding of rG4 structures thwarts the translation of protein coding sequences (CDS) by stimulating the translation of repressive upstream open reading frames (uORFs). To support our model, we demonstrate that depletion of two rG4s-specialised DEAH-box helicases, DHX36 and DHX9, shifts translation towards rG4-containing uORFs reducing the translation of selected transcripts comprising proto-oncogenes, transcription factors and epigenetic regulators. Transcriptome-wide identification of DHX9 binding sites using individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) demonstrate that translation regulation is mediated through direct physical interaction between the helicase and its rG4 substrate. Our findings unveil a previously unknown role for non-canonical structures in governing 5’-UTR translation and suggest that the interaction of helicases with rG4s could be considered as a target for future therapeutic intervention.

Download Full-text

Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation

10.1101/021501 ◽

2015 ◽

Cited By ~ 2

Author(s):

David E Weinberg ◽

Premal Shah ◽

Stephen W Eichhorn ◽

Jeffrey A Hussmann ◽

Joshua B Plotkin ◽

...

Keyword(s):

Translational Control ◽

Nucleotide Composition ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Coding Regions ◽

Genome Wide ◽

Upstream Open Reading Frames ◽

Improved Methods ◽

Reading Frames

Ribosome-footprint profiling provides genome-wide snapshots of translation, but technical challenges can confound its analysis. Here, we use improved methods to obtain ribosome-footprint profiles and mRNA abundances that more faithfully reflect gene expression in Saccharomyces cerevisiae. Our results support proposals that both the beginning of coding regions and codons matching rare tRNAs are more slowly translated. They also indicate that emergent polypeptides with as few as three basic residues within a 10-residue window tend to slow translation. With the improved mRNA measurements, the variation attributable to translational control in exponentially growing yeast was less than previously reported, and most of this variation could be predicted with a simple model that considered mRNA abundance, upstream open reading frames, cap-proximal structure and nucleotide composition, and lengths of the coding and 5′- untranslated regions. Collectively, our results reveal key features of translational control in yeast and provide a framework for executing and interpreting ribosome- profiling studies.

Download Full-text

Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences

10.1101/137547 ◽

2017 ◽

Cited By ~ 3

Author(s):

Josh Cuperus ◽

Benjamin Groves ◽

Anna Kuchina ◽

Alexander B. Rosenberg ◽

Nebojsa Jojic ◽

...

Keyword(s):

Great Majority ◽

Open Reading Frames ◽

Translational Efficiency ◽

Sequence Composition ◽

Growth Selection ◽

Random Library ◽

Upstream Open Reading Frames ◽

The Impact ◽

Reading Frames ◽

Parallel Growth

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the translational efficiency of the 5’ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5’ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on translation of Kozak sequence composition, upstream open reading frames (uORFs) and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the translational efficiency of both a held-out set of the random 5’ UTRs as well as native S. cerevisiae 5’ UTRs. The model additionally was used to computationally evolve highly translating 5’ UTRs. We confirmed experimentally that the great majority of the evolved sequences lead to higher translation rates than the starting sequences, demonstrating the predictive power of this model.

Download Full-text

uORF-Tools – Workflow for the determination of translation-regulatory upstream open reading frames

10.1101/415018 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anica Scholz ◽

Florian Eggenhofer ◽

Rick Gelhausen ◽

Björn Grüning ◽

Kathi Zarnack ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Annotation File ◽

Inhibitory Effects ◽

Protein Coding ◽

Reading Frame ◽

Upstream Open Reading Frames ◽

Induced Changes ◽

Reading Frames

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Download Full-text

Analysis of plant mRNA upstream open reading frames

Chinese Journal of Agricultural Biotechnology ◽

10.1079/cjb200554 ◽

2005 ◽

Vol 2 (1) ◽

pp. 59-66

Author(s):

Jin Yong-Feng ◽

Jin Hui-Qing ◽

Zhou Ping ◽

Bian Teng-Fei

Keyword(s):

Plant Species ◽

Open Reading Frames ◽

Untranslated Regions ◽

Translation Efficiency ◽

Consensus Sequences ◽

Codon Context ◽

Eukaryotic Mrnas ◽

Upstream Open Reading Frames ◽

Regulatory Functions ◽

Reading Frames

AbstractUpstream open reading frames (uORFs) in 5′-untranslated regions (5′-UTRs) of eukaryotic mRNAs play an important role in translation efficiency. Computational analysis of the upstream ATG (uATG) and uORFs of 5′-UTRs of plant mRNAs, adopted from the nucleotide sequence databank, was carried out. Statistical analysis revealed that up to 18% of 5′-UTRs contain uATG, which is much higher than the earlier estimate. Among them, about 50% of the genes have one uATG and nearly 20% of them have two uATGs. About 85% of uORFs are non-overlapping. Thirty per cent of uORF peptides comprise 1–5 aa, and about 80% of uORFs fall in the range of below 20 aa. Sequences flanking the uATG codon differ strikingly from the functional initiation codon and the uATG triplet is more frequently located in a non-optimal context. Consensus sequences of the ATG codon context of mRNA with and without uATG are similar, whereas the ATG codon context of mRNA without uATG is more frequently located in an optimal context than is mRNA with uATG. Most mRNAs with uATGs are possibly related to regulatory functions. In addition, most mRNA uORFs have no similarity between plant species whereas sequences of a few uORFs are highly conserved. For example, mRNA uORFs encoding S-adenosyl-l-methionine decarboxylase (AdoMetDC) share 75–100% homology between plant species, which is much more conserved than AdoMetDC protein.

Download Full-text

Identification and characterization of upstream open reading frames (uORF) in the 5′ untranslated regions (UTR) of genes in Saccharomyces cerevisiae

Current Genetics ◽

10.1007/s00294-005-0001-x ◽

2005 ◽

Vol 48 (2) ◽

pp. 77-87 ◽

Cited By ~ 43

Author(s):

Zhihong Zhang ◽

Fred S. Dietrich

Keyword(s):

Saccharomyces Cerevisiae ◽

Open Reading Frames ◽

Untranslated Regions ◽

Upstream Open Reading Frames ◽

Identification And Characterization ◽

Reading Frames

Download Full-text

A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq

10.1101/2021.06.10.447896 ◽

2021 ◽

Author(s):

Jonathan M Mudge ◽

Jorge Ruiz-Orera ◽

John R Prensner ◽

Marie A Brunet ◽

Jose Manuel Gonzalez ◽

...

Keyword(s):

Gene Annotation ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Biological Databases ◽

Protein Coding ◽

Circular Problem ◽

Advance Research ◽

Non Coding Rnas ◽

Reading Frames

Ribosome profiling (Ribo-seq) has catalyzed a paradigm shift in our understanding of the translational vocabulary of the human genome, discovering thousands of translated open reading frames (ORFs) within long non-coding RNAs and presumed untranslated regions of protein-coding genes. However, reference gene annotation projects have been circumspect in their incorporation of these ORFs due to uncertainties about their experimental reproducibility and physiological roles. Yet, it is indisputable that certain Ribo-seq ORFs make stable proteins, others mediate gene regulation, and many have medical implications. Ultimately, the absence of standardized ORF annotation has created a circular problem: while Ribo-seq ORFs remain unannotated by reference biological databases, this lack of characterisation will thwart research efforts examining their roles. Here, we outline the initial stages of a community-led effort supported by GENCODE / Ensembl, HGNC and UniProt to produce a consolidated catalog of human Ribo-seq ORFs.

Download Full-text

utr.annotation: a tool for annotating genomic variants that could influence post-transcriptional regulation

Bioinformatics ◽

10.1093/bioinformatics/btab635 ◽

2021 ◽

Author(s):

Yating Liu ◽

Joseph D Dougherty

Keyword(s):

R Package ◽

Open Reading Frames ◽

Supplementary Information ◽

Translation Start ◽

Upstream Open Reading Frames ◽

Mouse Species ◽

Translational Regulators ◽

Post Transcriptional Regulation ◽

Annotated Translation ◽

Reading Frames

Abstract Summary Whole genome sequencing of patient populations is identifying thousands of new variants in UnTranslated Regions(UTRs). While the consequences of UTR mutations are not as easily predicted from primary sequence as coding mutations are, there are some known features of UTRs that modulate their function. utr.annotation is an R package that can be used to annotate potential deleterious variants in the UTR regions for both human and mouse species. Given a CSV or VCF format variant file, utr.annotation provides information of each variant on whether and how it alters known translational regulators including: upstream Open Reading Frames (uORFs), upstream Kozak sequences, polyA signals, Kozak sequences at the annotated translation start site, start codons, and stop codons, conservation scores in the variant position, and whether and how it changes ribosome loading based on a model derived from empirical data. Availability utr.annotation is freely available on Bitbucket (https://bitbucket.org/jdlabteam/utr.annotation/src/master/) and CRAN (https://cran.r-project.org/web/packages/utr.annotation/index.html) Supplementary information Supplementary data are available at https://wustl.box.com/s/yye99bryfin89nav45gv91l5k35fxo7z.

Download Full-text

Regulatory start-stop elements in 5' untranslated regions pervasively modulate translation

10.1101/2021.07.26.453809 ◽

2021 ◽

Author(s):

Justin Rendleman ◽

Mahabub Pasha Mohammad ◽

Matthew Pressler ◽

Shuvadeep Maity ◽

Vladislava Hronova ◽

...

Keyword(s):

Transcription Factor ◽

Stop Codon ◽

Open Reading Frames ◽

Untranslated Regions ◽

Sequence Element ◽

Activating Transcription Factor 4 ◽

Upstream Open Reading Frames ◽

Activating Transcription Factor ◽

Transcription Factor 4 ◽

Reading Frames

Translation includes initiation, elongation, and termination, followed by ribosome recycling. We characterize a new sequence element in 5' untranslated regions that consists of an adjacent start and stop codon and thereby excludes elongation. In these start-stop elements, an initiating ribosome is simultaneously positioned for termination without having translocated. At the example of activating transcription factor 4 (ATF4), we demonstrate that start-stops modify downstream re-initiation, thereby repressing translation of upstream open reading frames and enhancing ATF4 inducibility under stress. Start-stop elements are abundant in both mammals and yeast and affect key regulators such as DROSHA and the oncogenic transcription factor NFIA. They provide a unique regulatory layer that impedes ribosome scanning without the energy-expensive peptide production that accompanies upstream open reading frames.

Download Full-text