Disrupting upstream translation in mRNAs is associated with human disease

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

Disrupting upstream translation in mRNAs leads to loss-of-function associated with human disease

10.1101/2020.09.09.287912 ◽

2020 ◽

Author(s):

David S.M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Daniel J. Rader ◽

Marylyn D. Ritchie ◽

...

Keyword(s):

Biological Significance ◽

Regulatory Elements ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Loss Of Function ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations

ABSTRACTRibosome-profiling has uncovered pervasive translation in 5’UTRs, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate new gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals respectively, and demonstrate their impact on gene expression in human cells. Our results establish new mechanisms relating uORF variation to loss-of-function of downstream genes, and demonstrate that translated uORFs are genetically constrained regulatory elements in 40% of human genes.

Download Full-text

Long antiparallel open reading frames are unlikely to be encoding essential proteins in prokaryotic genomes

10.1101/724807 ◽

2019 ◽

Author(s):

Denis Moshensky ◽

Andrei Alexeevski

Keyword(s):

Negative Selection ◽

Stop Codon ◽

Biological Significance ◽

Open Reading Frames ◽

Overlapping Genes ◽

Base Pairs ◽

Protein Coding ◽

Essential Proteins ◽

Prokaryotic Genomes ◽

Reading Frames

AbstractThe origin and evolution of genes that have common base pairs (overlapping genes) are of particular interest due to their influencing each other. Especially intriguing are gene pairs with long overlaps. In prokaryotes, co-directional overlaps longer than 60 bp were shown to be nonexistent except for some instances. A few antiparallel prokaryotic genes with long overlaps were described in the literature. We have analyzed putative long antiparallel overlapping genes to determine whether open reading frames (ORFs) located opposite to genes (antiparallel ORFs) can be protein-coding genes.We have confirmed that long antiparallel ORFs (AORFs) are observed reliably to be more frequent than expected. There are 10 472 000 AORFs in 929 analyzed genomes with overlap length more than 180 bp. Stop codons on the opposite to the coding strand are avoided in 2 898 cases with Benjamini-Hochberg threshold 0.01.Using Ka/Ks ratio calculations, we have revealed that long AORFs do not affect the type of selection acting on genes in a vast majority of cases. This observation indicates that long AORFs translations commonly are not under negative selection.The demonstrative example is 282 longer than 1 800 bp AORFs found opposite to extremely conserved dnaK genes. Translations of these AORFs were annotated “glutamate dehydrogenases” and were included into Pfam database as third protein family of glutamate dehydrogenases, PF10712. Ka/Ks analysis has demonstrated that if these translations correspond to proteins, they are not subjected by negative selection while dnaK genes are under strong stabilizing selection. Moreover, we have found other arguments against the hypothesis that these AORFs encode essential proteins, proteins indispensable for cellular machinery.However, some AORFs, in particular, dnaK related, have been found slightly resisting to synonymous changes in genes. It indicates the possibility of their translation. We speculate that translations of certain AORFs might have a functional role other than encoding essential proteins.Essential genes are unlikely to be encoded by AORFs in prokaryotic genomes. Nevertheless, some AORFs might have biological significance associated with their translations.Author summaryGenes that have common base pairs are called overlapping genes. We have examined the most intriguing case: if gene pairs encoded on opposite DNA strands exist in prokaryotes. An intersection length threshold 180 bp has been used. A few such pairs of genes were experimentally confirmed.We have detected all long antiparallel ORFs in 929 prokaryotic genomes and have found that the number of open reading frames, located opposite to annotated genes, is much more than expected according to statistical model. We have developed a measure of stop codon avoidance on the opposite strand. The lengths of found antiparallel ORFs with stop codon avoidance are typical for prokaryotic genes.Comparative genomics analysis shows that long antiparallel ORFs (AORFs) are unlikely to be essential protein-coding genes. We have analyzed distributions of features typical for essential proteins among formal translations of all long AORFs: prevalence of negative selection, non-uniformity of a conserved positions distribution in a multiple alignment of homologous proteins, the character of homologs distribution in phylogenetic tree of prokaryotes. All of them have not been observed for the majority of long AORFs. Particularly, the same results have been obtained for some experimentally confirmed AOGs.Thus, pairs of antiparallel overlapping essential genes are unlikely to exist. On the other hand, some antiparallel ORFs affect the evolution of genes opposite that they are located. Consequently, translations of some antiparallel ORFs might have yet unknown biological significance.

Download Full-text

uORF-Tools – Workflow for the determination of translation-regulatory upstream open reading frames

10.1101/415018 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anica Scholz ◽

Florian Eggenhofer ◽

Rick Gelhausen ◽

Björn Grüning ◽

Kathi Zarnack ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Annotation File ◽

Inhibitory Effects ◽

Protein Coding ◽

Reading Frame ◽

Upstream Open Reading Frames ◽

Induced Changes ◽

Reading Frames

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Download Full-text

RNA G-quadruplexes mark repressive upstream open reading frames in human mRNAs

10.1101/223073 ◽

2017 ◽

Cited By ~ 1

Author(s):

Pierre Murat ◽

Giovanni Marsico ◽

Barbara Herdy ◽

Avazeh Ghanbarian ◽

Guillem Portella ◽

...

Keyword(s):

Secondary Structures ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Translation Regulation ◽

Physical Interaction ◽

Protein Coding ◽

Upstream Open Reading Frames ◽

Nucleotide Resolution ◽

Reading Frames

ABSTRACTRNA secondary structures in the 5’ untranslated regions (UTRs) of mRNAs have been characterised as key determinants of translation initiation. However the role of non-canonical secondary structures, such as RNA G-quadruplexes (rG4s), in modulating translation of human mRNAs and the associated mechanisms remain largely unappreciated. Here we use a ribosome profiling strategy to investigate the translational landscape of human mRNAs with structured 5’ untranslated regions (5’-UTR). We found that inefficiently translated mRNAs, containing rG4-forming sequences in their 5’-UTRs, have an accumulation of ribosome footprints in their 5’-UTRs. We show that rG4-forming sequences are determinants of 5’-UTR translation, suggesting that the folding of rG4 structures thwarts the translation of protein coding sequences (CDS) by stimulating the translation of repressive upstream open reading frames (uORFs). To support our model, we demonstrate that depletion of two rG4s-specialised DEAH-box helicases, DHX36 and DHX9, shifts translation towards rG4-containing uORFs reducing the translation of selected transcripts comprising proto-oncogenes, transcription factors and epigenetic regulators. Transcriptome-wide identification of DHX9 binding sites using individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) demonstrate that translation regulation is mediated through direct physical interaction between the helicase and its rG4 substrate. Our findings unveil a previously unknown role for non-canonical structures in governing 5’-UTR translation and suggest that the interaction of helicases with rG4s could be considered as a target for future therapeutic intervention.

Download Full-text

Thousands of novel unannotated proteins expand the MHC I immunopeptidome in cancer

10.1101/2020.02.12.945840 ◽

2020 ◽

Cited By ~ 6

Author(s):

Tamara Ouspenskaia ◽

Travis Law ◽

Karl R. Clauser ◽

Susan Klaeger ◽

Siranush Sarkizova ◽

...

Keyword(s):

Somatic Mutations ◽

Tumor Antigens ◽

Ribosome Profiling ◽

Lymphocytic Leukemia ◽

Open Reading Frames ◽

Specific Expression ◽

Protein Coding ◽

Mhc I ◽

Coding Regions ◽

Reading Frames

AbstractTumor epitopes – peptides that are presented on surface-bound MHC I proteins - provide targets for cancer immunotherapy and have been identified extensively in the annotated protein-coding regions of the genome. Motivated by the recent discovery of translated novel unannotated open reading frames (nuORFs) using ribosome profiling (Ribo-seq), we hypothesized that cancer-associated processes could generate nuORFs that can serve as a new source of tumor antigens that harbor somatic mutations or show tumor-specific expression. To identify cancer-specific nuORFs, we generated Ribo-seq profiles for 29 malignant and healthy samples, developed a sensitive analytic approach for hierarchical ORF prediction, and constructed a high-confidence database of translated nuORFs across tissues. Peptides from 3,555 unique translated nuORFs were presented on MHC I, based on analysis of an extensive dataset of MHC I-bound peptides detected by mass spectrometry, with >20-fold more nuORF peptides detected in the MHC I immunopeptidomes compared to whole proteomes. We further detected somatic mutations in nuORFs of cancer samples and identified nuORFs with tumor-specific translation in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs thus expand the pool of MHC I-presented, tumor-specific peptides, targetable by immunotherapies.

Download Full-text

A spectral analysis approach to detect actively translated open reading frames in high-resolution ribosome profiling data

10.1101/031625 ◽

2015 ◽

Author(s):

Lorenzo Calviello ◽

Neelanjan Mukherjee ◽

Emanuel Wyler ◽

Henrik Zauber ◽

Antje Hirsekorn ◽

...

Keyword(s):

Spectral Analysis ◽

Gene Expression Regulation ◽

De Novo ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Mass Spectrometry Data ◽

Hek293 Cells ◽

Protein Coding ◽

Reading Frame ◽

Reading Frames

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.

Download Full-text

Comprehensive Annotations of Human Herpesvirus 6A and 6B Genomes Reveal Novel and Conserved Genomic Features

10.1101/730028 ◽

2019 ◽

Author(s):

Yaara Finkel ◽

Dominik Schmiedel ◽

Julie Tai-Schmiedel ◽

Aharon Nachshon ◽

Michal Schwartz ◽

...

Keyword(s):

Human Herpesvirus ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Temporal Expression ◽

Protein Coding ◽

Functional Studies ◽

Viral Genes ◽

Non Coding Rnas ◽

Coding Potential ◽

Reading Frames

AbstractHuman herpesvirus 6 (HHV-6) A and B are highly ubiquitous betaherpesviruses, infecting the majority of the human population. Like other herpesviruses, they encompass large genomes and our understanding of their protein coding potential is far from complete. Here we employ ribosome profiling and systematic transcript analysis to experimentally define the HHV-6 translation products and to follow their temporal expression. We identify hundreds of new open reading frames (ORFs), including many upstream ORFs (uORFs) and internal ORFs (iORFs), generating a complete unbiased atlas of HHV-6 proteome. Furthermore, by integrating systematic data from the prototypic betaherpesvirus, human cytomegalovirus, we uncover numerous uORFs and iORFs that are conserved across betaherpesviruses and we show that uORFs are specifically enriched in late viral genes. Using our transcriptome measurements, we identified three highly abundant HHV-6 encoded long non-coding RNAs (lncRNAs), one of which generates a non-polyadenylated stable intron that appears to be a conserved feature of betaherpesviruses. Overall, our work reveals the complexity of HHV-6 genomes and highlights novel features that are conserved between betaherpesviruses, providing a rich resource for future functional studies.

Download Full-text

A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq

10.1101/2021.06.10.447896 ◽

2021 ◽

Author(s):

Jonathan M Mudge ◽

Jorge Ruiz-Orera ◽

John R Prensner ◽

Marie A Brunet ◽

Jose Manuel Gonzalez ◽

...

Keyword(s):

Gene Annotation ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Biological Databases ◽

Protein Coding ◽

Circular Problem ◽

Advance Research ◽

Non Coding Rnas ◽

Reading Frames

Ribosome profiling (Ribo-seq) has catalyzed a paradigm shift in our understanding of the translational vocabulary of the human genome, discovering thousands of translated open reading frames (ORFs) within long non-coding RNAs and presumed untranslated regions of protein-coding genes. However, reference gene annotation projects have been circumspect in their incorporation of these ORFs due to uncertainties about their experimental reproducibility and physiological roles. Yet, it is indisputable that certain Ribo-seq ORFs make stable proteins, others mediate gene regulation, and many have medical implications. Ultimately, the absence of standardized ORF annotation has created a circular problem: while Ribo-seq ORFs remain unannotated by reference biological databases, this lack of characterisation will thwart research efforts examining their roles. Here, we outline the initial stages of a community-led effort supported by GENCODE / Ensembl, HGNC and UniProt to produce a consolidated catalog of human Ribo-seq ORFs.

Download Full-text

Frequent translation of small open reading frames in evolutionary conserved lncRNA regions

10.1101/348326 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jorge Ruiz-Orera ◽

M.Mar Albà

Keyword(s):

Ribosomal Protein ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Conserved Regions ◽

Biochemical Measurements ◽

Rna Interaction ◽

Human And Mouse ◽

Reading Frames ◽

Small Open Reading Frames

SUMMARYThe mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes. Although many of these transcripts show homology between human and mouse, only a small proportion of them have been functionally characterized. Here we use ribosome profiling data to identify translated open reading frames, as well as non-ribosomal protein-RNA interactions, in evolutionary conserved and non-conserved transcripts. We find that conserved regions are subject to significant evolutionary constraints and are enriched in translated open reading frames, as well as non-ribosomal protein-RNA interaction signatures, when compared to non-conserved regions. Translated ORFs can be divided in two classes, those encoding functional micropeptides and those that show no evidence of protein functionality. This study underscores the importance of combining evolutionary and biochemical measurements to advance in a more complete understanding of the transcriptome.

Download Full-text

Systematic analysis of the PTEN 5′ leader identifies a major AUU initiated proteoform

Open Biology ◽

10.1098/rsob.150203 ◽

2016 ◽

Vol 6 (5) ◽

pp. 150203 ◽

Cited By ~ 20

Author(s):

Ioanna Tzani ◽

Ivaylo P. Ivanov ◽

Dmitri E. Andreev ◽

Ruslan I. Dmitriev ◽

Kellie A. Dean ◽

...

Keyword(s):

Pi3k Pathway ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Sequencing Data ◽

Human Tumour ◽

Systematic Analysis ◽

Human Genes ◽

Abundant Evidence ◽

Upstream Open Reading Frames ◽

Reading Frames

Abundant evidence for translation within the 5′ leaders of many human genes is rapidly emerging, especially, because of the advent of ribosome profiling. In most cases, it is believed that the act of translation rather than the encoded peptide is important. However, the wealth of available sequencing data in recent years allows phylogenetic detection of sequences within 5′ leaders that have emerged under coding constraint and therefore allow for the prediction of functional 5′ leader translation. Using this approach, we previously predicted a CUG-initiated, 173 amino acid N-terminal extension to the human tumour suppressor PTEN. Here, a systematic experimental analysis of translation events in the PTEN 5′ leader identifies at least two additional non-AUG-initiated PTEN proteoforms that are expressed in most human cell lines tested. The most abundant extended PTEN proteoform initiates at a conserved AUU codon and extends the canonical AUG-initiated PTEN by 146 amino acids. All N-terminally extended PTEN proteoforms tested retain the ability to downregulate the PI3K pathway. We also provide evidence for the translation of two conserved AUG-initiated upstream open reading frames within the PTEN 5′ leader that control the ratio of PTEN proteoforms.

Download Full-text