scholarly journals Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.

2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


2015 ◽  
Author(s):  
Robin C Friedman ◽  
Stefan Kalkhof ◽  
Olivia Doppelt-Azeroual ◽  
Stephan Mueller ◽  
Martina Chovancova ◽  
...  

While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in phylogenetically diverse bacteria. A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 188 ± 25.5 unannotated sRNA ORFs are under selection to maintain coding, an average of 13 per species considered here. This implies that overall at least 7.5 ± 0.3% of sRNAs have a coding ORF, and in some species at least 20% do. 84 ± 9.8 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated according to ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and two S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are novel components of type I toxin/antitoxin systems. Our predictions for sRNA coding ORFs, including novel type I toxins, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr.


Author(s):  
Rick Gelhausen ◽  
Florian Heyl ◽  
Sarah L. Svensson ◽  
Kathrin Froschauer ◽  
Lydia Hadjeras ◽  
...  

AbstractMotivationRibosome profiling (Ribo-seq) is a powerful approach based on ribosome-protected RNA fragments to explore the translatome of a cell, and is especially useful for the detection of small proteins (<=70 amino acids) that are recalcitrant to biochemical and in silico approaches. While pipelines are available to analyze Ribo-seq data, none are designed explicitly for the analysis of Ribo-seq data from prokaryotes, nor are they focused on the discovery of unannotated open reading frames (ORFs) in bacteria.ResultsWe present HRIBO (High-throughput annotation by Ribo-seq), a workflow to enable reproducible and high-throughput analysis of bacterial Ribo-seq data. The workflow performs all required pre-processing and quality control steps. Importantly, HRIBO outputs annotation-independent ORF predictions based on two complementary bacteria-focused tools, and integrates them with additional features. This facilitates the rapid discovery of novel ORFs and their prioritization for functional characterization.AvailabilityHRIBO is a free and open source project available under the GPL-3 license at: https://github.com/RickGelhausen/HRIBO


Author(s):  
Rick Gelhausen ◽  
Sarah L Svensson ◽  
Kathrin Froschauer ◽  
Florian Heyl ◽  
Lydia Hadjeras ◽  
...  

Abstract Motivation Ribosome profiling (Ribo-seq) is a powerful approach based on deep sequencing of cDNA libraries generated from ribosome-protected RNA fragments to explore the translatome of a cell, and is especially useful for the detection of small proteins (50–100 amino acids) that are recalcitrant to many standard biochemical and in silico approaches. While pipelines are available to analyze Ribo-seq data, none are designed explicitly for the automatic processing and analysis of data from bacteria, nor are they focused on the discovery of unannotated open reading frames (ORFs). Results We present HRIBO (High-throughput annotation by Ribo-seq), a workflow to enable reproducible and high-throughput analysis of bacterial Ribo-seq data. The workflow performs all required pre-processing and quality control steps. Importantly, HRIBO outputs annotation-independent ORF predictions based on two complementary bacteria-focused tools, and integrates them with additional feature information and expression values. This facilitates the rapid and high-confidence discovery of novel ORFs and their prioritization for functional characterization. Availabilityand implementation HRIBO is a free and open source project available under the GPL-3 license at: https://github.com/RickGelhausen/HRIBO.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


Gene ◽  
2006 ◽  
Vol 376 (1) ◽  
pp. 59-67 ◽  
Author(s):  
Sandra Morales-Arrieta ◽  
Maria Elena Rodríguez ◽  
Lorenzo Segovia ◽  
Agustín López-Munguía ◽  
Clarita Olvera-Carranza

Science ◽  
2020 ◽  
Vol 367 (6482) ◽  
pp. 1140-1146 ◽  
Author(s):  
Jin Chen ◽  
Andreas-David Brunner ◽  
J. Zachery Cogan ◽  
James K. Nuñez ◽  
Alexander P. Fields ◽  
...  

Ribosome profiling has revealed pervasive but largely uncharacterized translation outside of canonical coding sequences (CDSs). In this work, we exploit a systematic CRISPR-based screening strategy to identify hundreds of noncanonical CDSs that are essential for cellular growth and whose disruption elicits specific, robust transcriptomic and phenotypic changes in human cells. Functional characterization of the encoded microproteins reveals distinct cellular localizations, specific protein binding partners, and hundreds of microproteins that are presented by the human leukocyte antigen system. We find multiple microproteins encoded in upstream open reading frames, which form stable complexes with the main, canonical protein encoded on the same messenger RNA, thereby revealing the use of functional bicistronic operons in mammals. Together, our results point to a family of functional human microproteins that play critical and diverse cellular roles.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


1998 ◽  
Vol 44 (1) ◽  
pp. 91-94
Author(s):  
G Scott Jenkins ◽  
Mark S Chandler ◽  
Pamela S Fink

The putative 4.5S RNA of Haemophilus influenzae was identified in the genome by computer analysis, amplified by the polymerase chain reaction, and cloned. We have determined that this putative 4.5S RNA will complement an Escherichia coli strain conditionally defective in 4.5S RNA production. The predicted secondary structures of the molecules were quite similar, but Northern analysis showed that the H. influenzae RNA was slightly larger than the E. coli RNA. The H. influenzae gene encoding this RNA is the functional homolog of the ffs gene in E. coli. Key words: ffs gene, complementation studies, small RNA, prokaryotic genetics.


Sign in / Sign up

Export Citation Format

Share Document