scholarly journals Identification of novel translated small ORFs in Escherichia coli using complementary ribosome profiling approaches

2021 ◽  
Author(s):  
Anne M Stringer ◽  
Carol Smith ◽  
Kyle Mangano ◽  
Joseph Thomas Wade

Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons for ~400 novel putative ORFs with high sensitivity. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs.

2021 ◽  
Author(s):  
Anne Stringer ◽  
Carol Smith ◽  
Kyle Mangano ◽  
Joseph T. Wade

Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons with high sensitivity for ∼400 novel putative ORFs. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs. IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli .


2019 ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the true prevalence of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly-initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. The corresponding genes are not only intergenic, but are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the function of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


2021 ◽  
Author(s):  
Rick Gelhausen ◽  
Teresa Müller ◽  
Sarah Svensson ◽  
Omer S. Alkhnbashi ◽  
Cynthia M. Sharma ◽  
...  

Small proteins, those encoded by open reading frames, with less than or equal to 50 codons, are emerging as an important class of cellular macromolecules in all kingdoms of life. However, they are recalcitrant to detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation of sORFs in diverse species, and this has driven the development of ORF detection tools using Ribo-seq read signals. However, only a handful of tools have been designed for bacterial data, and have not yet been systematically compared. Here, we have performed a comprehensive benchmark of ORF prediction tools which handle bacterial Ribo-seq data. For this, we created a novel Ribo-seq dataset for E. coli, and based on this plus three publicly available datasets for different bacteria, we created a benchmark set by manual labeling of translated ORFs using their Ribo-seq expression profile. This was then used to investigate the predictive performance of four Ribo-seq-based ORF detection tools we found are compatible with bacterial data (REPARATION_blast, DeepRibo, Ribo-TISH and SPECtre). The tool IRSOM was also included as a comparison for tools using coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for those inside or outside of operons. However, none of the tools was able to predict a set of recently identified, novel, experimentally-verified sORFs with high sensitivity. Overall, we find there is potential for improving the performance, applicability, usability, and reproducibility of prokaryotic ORF prediction tools that use Ribo-Seq as input.


mBio ◽  
2019 ◽  
Vol 10 (2) ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the total number of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organismEscherichia coliusing theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions inE. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. Not only are the corresponding genes intergenic but they are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the functions of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification, and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


2020 ◽  
Vol 36 (6-7) ◽  
pp. 675-677
Author(s):  
Bertrand Jordan

A systematic search for non-conventional open reading frames in human DNA reveals a large number of small ORFs encoding peptides generally smaller than 100 amino-acids. These ORFs are transcribed and translated into small proteins, which are demonstrated to have functional significance by bulk CRISPR inactivation. Evidence is also found for bicistronic mRNAs including such a small ORF upstream of a canonical coding sequence. These findings add a new facet to our understanding of biological processes.


2019 ◽  
Author(s):  
Jill G. Canestrari ◽  
Erica Lasek-Nesselquist ◽  
Ashutosh Upadhyay ◽  
Martina Rofaeil ◽  
Matthew M. Champion ◽  
...  

ABSTRACTGenome-wide transcriptomic analyses have revealed abundant expressed short open reading frames (ORFs) in bacteria. Whether these short ORFs, or the small proteins they encode, are functional remains an open question. One quarter of mycobacterial mRNAs are leaderless, beginning with a 5’-AUG or GUG initiation codon. Leaderless mRNAs often encode unannotated short ORFs as the first gene of a polycistronic transcript. Here we show that polycysteine-encoding leaderless short ORFs function as cysteine-responsive attenuators of operonic gene expression. Detailed mutational analysis shows that one polycysteine short ORF controls expression of the downstream genes. Our data indicate that ribosomes stalled in the polycysteine tract block mRNA structures that otherwise sequester the ribosome-binding site of the 3’gene. We assessed endogenous proteomic responses to cysteine limitation in Mycobacterium smegmatis using mass spectrometry. Six cysteine metabolic loci having unannotated polycysteine-encoding leaderless short ORF architectures responded to cysteine limitation, revealing widespread cysteine-responsive attenuation in mycobacteria. Individual leaderless short ORFs confer independent operon-level control, while their shared dependence on cysteine ensures a collective response mediated by ribosome pausing. We propose the term ribulon to classify ribosome-directed regulons. Regulon-level coordination by ribosomes on sensory short ORFs illustrates one utility of the many unannotated short ORFs expressed in bacterial genomes.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.


2021 ◽  
Author(s):  
Yanyan Li ◽  
Honghong Zhou ◽  
Xiaomin Chen ◽  
Yu Zheng ◽  
Quan Kang ◽  
...  

Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORFs translation events or sequences, and significantly increased data volume. More components such as non-AUG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets and collected from the literature and other sources originating from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.


2015 ◽  
Author(s):  
Robin C Friedman ◽  
Stefan Kalkhof ◽  
Olivia Doppelt-Azeroual ◽  
Stephan Mueller ◽  
Martina Chovancova ◽  
...  

While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in phylogenetically diverse bacteria. A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 188 ± 25.5 unannotated sRNA ORFs are under selection to maintain coding, an average of 13 per species considered here. This implies that overall at least 7.5 ± 0.3% of sRNAs have a coding ORF, and in some species at least 20% do. 84 ± 9.8 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated according to ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and two S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are novel components of type I toxin/antitoxin systems. Our predictions for sRNA coding ORFs, including novel type I toxins, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr.


Sign in / Sign up

Export Citation Format

Share Document