Frequent translation of small open reading frames in evolutionary conserved lncRNA regions

SUMMARYThe mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes. Although many of these transcripts show homology between human and mouse, only a small proportion of them have been functionally characterized. Here we use ribosome profiling data to identify translated open reading frames, as well as non-ribosomal protein-RNA interactions, in evolutionary conserved and non-conserved transcripts. We find that conserved regions are subject to significant evolutionary constraints and are enriched in translated open reading frames, as well as non-ribosomal protein-RNA interaction signatures, when compared to non-conserved regions. Translated ORFs can be divided in two classes, those encoding functional micropeptides and those that show no evidence of protein functionality. This study underscores the importance of combining evolutionary and biochemical measurements to advance in a more complete understanding of the transcriptome.

Download Full-text

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

Conserved regions in long non-coding RNAs contain abundant translation and protein–RNA interaction signatures

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqz002 ◽

2019 ◽

Vol 1 (1) ◽

pp. e2-e2 ◽

Cited By ~ 4

Author(s):

Jorge Ruiz-Orera ◽

M Mar Albà

Keyword(s):

Biological Significance ◽

Ribosome Profiling ◽

Evolutionary Analysis ◽

Amino Acid Sequence Level ◽

Protein Coding ◽

Conserved Regions ◽

Non Coding Rnas ◽

Rna Interaction ◽

Regulatory Functions ◽

Sequence Constraints

Abstract The mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes and that are known as long non-coding RNAs (lncRNAs). A handful of lncRNAs have well-characterized regulatory functions but the biological significance of the majority of them is not well understood. LncRNAs that are conserved between mice and humans are likely to be enriched in functional sequences. Here, we investigate the presence of different types of ribosome profiling signatures in lncRNAs and how they relate to sequence conservation. We find that lncRNA-conserved regions contain three times more ORFs with translation evidence than non-conserved ones, and identify nine cases that display significant sequence constraints at the amino acid sequence level. The study also reveals that conserved regions in intergenic lncRNAs are significantly enriched in protein–RNA interaction signatures when compared to non-conserved ones; this includes sites in well-characterized lncRNAs, such as Cyrano, Malat1, Neat1 and Meg3, as well as in tens of lncRNAs of unknown function. This work illustrates how the analysis of ribosome profiling data coupled with evolutionary analysis provides new opportunities to explore the lncRNA functional landscape.

Download Full-text

Accurate Annotation of Protein‐coding Small Open Reading Frames in the Human Genome

The FASEB Journal ◽

10.1096/fasebj.2020.34.s1.03051 ◽

2020 ◽

Vol 34 (S1) ◽

pp. 1-1

Author(s):

Thomas F. Martinez ◽

Qian Chu ◽

Cynthia Donaldson ◽

Dan Tan ◽

Maxim N. Shokhirev ◽

...

Keyword(s):

Human Genome ◽

Open Reading Frames ◽

Protein Coding ◽

Reading Frames ◽

Small Open Reading Frames

Download Full-text

uORF-Tools – Workflow for the determination of translation-regulatory upstream open reading frames

10.1101/415018 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anica Scholz ◽

Florian Eggenhofer ◽

Rick Gelhausen ◽

Björn Grüning ◽

Kathi Zarnack ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Annotation File ◽

Inhibitory Effects ◽

Protein Coding ◽

Reading Frame ◽

Upstream Open Reading Frames ◽

Induced Changes ◽

Reading Frames

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Download Full-text

Autonomous functionality of an upstream open reading frame in polycistronic mammalian mRNAs

10.1101/325571 ◽

2018 ◽

Author(s):

Shohei Kitano ◽

Gabriel Pratt ◽

Keizo Takao ◽

Yasunori Aizawa

Keyword(s):

Brain Regions ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Upstream Open Reading Frame ◽

Translation Regulation ◽

Reading Frame ◽

Eukaryotic Translation ◽

Upstream Open Reading Frames ◽

Human And Mouse ◽

Reading Frames

SUMMARYUpstream open reading frames (uORFs) are established as cis-acting elements for eukaryotic translation of annotated ORFs (anORFs) located on the same mRNAs. Here, we identified a mammalian uORF with functions that are independent from anORF translation regulation. Bioinformatics screening using ribosome profiling data of human and mouse brains yielded 308 neurologically vital genes from which anORF and uORFs are polycistronically translated in both species. Among them, Arhgef9 contains a uORF named SPICA, which is highly conserved among vertebrates and stably translated only in specific brain regions of mice. Disruption of SPICA translation by ATG-to-TAG substitutions did not perturb translation or function of its anORF product, collybistin. SPICA-null mice displayed abnormal maternal reproductive performance and enhanced anxiety-like behavior, characteristic of ARHGEF9-associated neurological disorders. This study demonstrates that mammalian uORFs can be independent genetic units, revising the prevailing dogma of the monocistronic gene in mammals, and even eukaryotes.

Download Full-text

RNA G-quadruplexes mark repressive upstream open reading frames in human mRNAs

10.1101/223073 ◽

2017 ◽

Cited By ~ 1

Author(s):

Pierre Murat ◽

Giovanni Marsico ◽

Barbara Herdy ◽

Avazeh Ghanbarian ◽

Guillem Portella ◽

...

Keyword(s):

Secondary Structures ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Translation Regulation ◽

Physical Interaction ◽

Protein Coding ◽

Upstream Open Reading Frames ◽

Nucleotide Resolution ◽

Reading Frames

ABSTRACTRNA secondary structures in the 5’ untranslated regions (UTRs) of mRNAs have been characterised as key determinants of translation initiation. However the role of non-canonical secondary structures, such as RNA G-quadruplexes (rG4s), in modulating translation of human mRNAs and the associated mechanisms remain largely unappreciated. Here we use a ribosome profiling strategy to investigate the translational landscape of human mRNAs with structured 5’ untranslated regions (5’-UTR). We found that inefficiently translated mRNAs, containing rG4-forming sequences in their 5’-UTRs, have an accumulation of ribosome footprints in their 5’-UTRs. We show that rG4-forming sequences are determinants of 5’-UTR translation, suggesting that the folding of rG4 structures thwarts the translation of protein coding sequences (CDS) by stimulating the translation of repressive upstream open reading frames (uORFs). To support our model, we demonstrate that depletion of two rG4s-specialised DEAH-box helicases, DHX36 and DHX9, shifts translation towards rG4-containing uORFs reducing the translation of selected transcripts comprising proto-oncogenes, transcription factors and epigenetic regulators. Transcriptome-wide identification of DHX9 binding sites using individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) demonstrate that translation regulation is mediated through direct physical interaction between the helicase and its rG4 substrate. Our findings unveil a previously unknown role for non-canonical structures in governing 5’-UTR translation and suggest that the interaction of helicases with rG4s could be considered as a target for future therapeutic intervention.

Download Full-text

Accurate annotation of human protein-coding small open reading frames

Nature Chemical Biology ◽

10.1038/s41589-019-0425-0 ◽

2019 ◽

Vol 16 (4) ◽

pp. 458-468 ◽

Cited By ~ 16

Author(s):

Thomas F. Martinez ◽

Qian Chu ◽

Cynthia Donaldson ◽

Dan Tan ◽

Maxim N. Shokhirev ◽

...

Keyword(s):

Open Reading Frames ◽

Human Protein ◽

Protein Coding ◽

Reading Frames ◽

Small Open Reading Frames

Download Full-text

SmProt: a reliable repository with comprehensive annotation of small proteins identified from ribosome profiling

10.1101/2021.04.29.441405 ◽

2021 ◽

Author(s):

Yanyan Li ◽

Honghong Zhou ◽

Xiaomin Chen ◽

Yu Zheng ◽

Quan Kang ◽

...

Keyword(s):

Genetic Variants ◽

Rattus Norvegicus ◽

Homo Sapiens ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Small Proteins ◽

Data Volume ◽

Reading Frames ◽

Disease Specific ◽

Small Open Reading Frames

Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORFs translation events or sequences, and significantly increased data volume. More components such as non-AUG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets and collected from the literature and other sources originating from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.

Download Full-text

Faculty Opinions recommendation of Accurate annotation of human protein-coding small open reading frames.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737056462.793568443 ◽

2019 ◽

Author(s):

Rami Hannoush

Keyword(s):

Open Reading Frames ◽

Human Protein ◽

Protein Coding ◽

Reading Frames ◽

Small Open Reading Frames

Download Full-text

Thousands of novel unannotated proteins expand the MHC I immunopeptidome in cancer

10.1101/2020.02.12.945840 ◽

2020 ◽

Cited By ~ 6

Author(s):

Tamara Ouspenskaia ◽

Travis Law ◽

Karl R. Clauser ◽

Susan Klaeger ◽

Siranush Sarkizova ◽

...

Keyword(s):

Somatic Mutations ◽

Tumor Antigens ◽

Ribosome Profiling ◽

Lymphocytic Leukemia ◽

Open Reading Frames ◽

Specific Expression ◽

Protein Coding ◽

Mhc I ◽

Coding Regions ◽

Reading Frames

AbstractTumor epitopes – peptides that are presented on surface-bound MHC I proteins - provide targets for cancer immunotherapy and have been identified extensively in the annotated protein-coding regions of the genome. Motivated by the recent discovery of translated novel unannotated open reading frames (nuORFs) using ribosome profiling (Ribo-seq), we hypothesized that cancer-associated processes could generate nuORFs that can serve as a new source of tumor antigens that harbor somatic mutations or show tumor-specific expression. To identify cancer-specific nuORFs, we generated Ribo-seq profiles for 29 malignant and healthy samples, developed a sensitive analytic approach for hierarchical ORF prediction, and constructed a high-confidence database of translated nuORFs across tissues. Peptides from 3,555 unique translated nuORFs were presented on MHC I, based on analysis of an extensive dataset of MHC I-bound peptides detected by mass spectrometry, with >20-fold more nuORF peptides detected in the MHC I immunopeptidomes compared to whole proteomes. We further detected somatic mutations in nuORFs of cancer samples and identified nuORFs with tumor-specific translation in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs thus expand the pool of MHC I-presented, tumor-specific peptides, targetable by immunotherapies.

Download Full-text