Identifying small proteins by ribosome profiling with stalled initiation complexes

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the true prevalence of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly-initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. The corresponding genes are not only intergenic, but are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the function of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.

Download Full-text

Identifying Small Proteins by Ribosome Profiling with Stalled Initiation Complexes

mBio ◽

10.1128/mbio.02819-18 ◽

2019 ◽

Vol 10 (2) ◽

Cited By ~ 45

Author(s):

Jeremy Weaver ◽

Fuad Mohammad ◽

Allen R. Buskirk ◽

Gisela Storz

Keyword(s):

Amino Acids ◽

Model Organism ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Genomic Context ◽

Content Type ◽

New Genes ◽

Small Proteins ◽

Intergenic Regions ◽

Reading Frames

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the total number of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organismEscherichia coliusing theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions inE. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. Not only are the corresponding genes intergenic but they are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the functions of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification, and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.

Download Full-text

Chroniques génomiques

médecine/sciences ◽

10.1051/medsci/2020108 ◽

2020 ◽

Vol 36 (6-7) ◽

pp. 675-677

Author(s):

Bertrand Jordan

Keyword(s):

Amino Acids ◽

Functional Significance ◽

Open Reading Frames ◽

Systematic Search ◽

Biological Processes ◽

Coding Sequence ◽

Small Proteins ◽

Human Dna ◽

Small Orfs ◽

Reading Frames

A systematic search for non-conventional open reading frames in human DNA reveals a large number of small ORFs encoding peptides generally smaller than 100 amino-acids. These ORFs are transcribed and translated into small proteins, which are demonstrated to have functional significance by bulk CRISPR inactivation. Evidence is also found for bicistronic mRNAs including such a small ORF upstream of a canonical coding sequence. These findings add a new facet to our understanding of biological processes.

Download Full-text

SmProt: a reliable repository with comprehensive annotation of small proteins identified from ribosome profiling

10.1101/2021.04.29.441405 ◽

2021 ◽

Author(s):

Yanyan Li ◽

Honghong Zhou ◽

Xiaomin Chen ◽

Yu Zheng ◽

Quan Kang ◽

...

Keyword(s):

Genetic Variants ◽

Rattus Norvegicus ◽

Homo Sapiens ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Small Proteins ◽

Data Volume ◽

Reading Frames ◽

Disease Specific ◽

Small Open Reading Frames

Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORFs translation events or sequences, and significantly increased data volume. More components such as non-AUG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets and collected from the literature and other sources originating from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.

Download Full-text

Identification of novel translated small ORFs in Escherichia coli using complementary ribosome profiling approaches

10.1101/2021.07.02.450978 ◽

2021 ◽

Author(s):

Anne M Stringer ◽

Carol Smith ◽

Kyle Mangano ◽

Joseph Thomas Wade

Keyword(s):

Escherichia Coli ◽

Amino Acids ◽

High Sensitivity ◽

Purifying Selection ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Small Subset ◽

Stop Codons ◽

Small Proteins ◽

Short Orfs

Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons for ~400 novel putative ORFs with high sensitivity. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs.

Download Full-text

RiboReport - Benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria

10.1101/2021.06.08.447495 ◽

2021 ◽

Author(s):

Rick Gelhausen ◽

Teresa Müller ◽

Sarah Svensson ◽

Omer S. Alkhnbashi ◽

Cynthia M. Sharma ◽

...

Keyword(s):

High Sensitivity ◽

Predictive Performance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Rna Seq ◽

E Coli ◽

Prediction Tools ◽

Small Proteins ◽

Significant Difference ◽

Reading Frames

Small proteins, those encoded by open reading frames, with less than or equal to 50 codons, are emerging as an important class of cellular macromolecules in all kingdoms of life. However, they are recalcitrant to detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation of sORFs in diverse species, and this has driven the development of ORF detection tools using Ribo-seq read signals. However, only a handful of tools have been designed for bacterial data, and have not yet been systematically compared. Here, we have performed a comprehensive benchmark of ORF prediction tools which handle bacterial Ribo-seq data. For this, we created a novel Ribo-seq dataset for E. coli, and based on this plus three publicly available datasets for different bacteria, we created a benchmark set by manual labeling of translated ORFs using their Ribo-seq expression profile. This was then used to investigate the predictive performance of four Ribo-seq-based ORF detection tools we found are compatible with bacterial data (REPARATION_blast, DeepRibo, Ribo-TISH and SPECtre). The tool IRSOM was also included as a comparison for tools using coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for those inside or outside of operons. However, none of the tools was able to predict a set of recently identified, novel, experimentally-verified sORFs with high sensitivity. Overall, we find there is potential for improving the performance, applicability, usability, and reproducibility of prokaryotic ORF prediction tools that use Ribo-Seq as input.

Download Full-text

smORFer: a modular algorithm to detect small ORFs in prokaryotes

10.1101/2020.05.21.109181 ◽

2020 ◽

Author(s):

Alexander Bartholomäus ◽

Baban Kolte ◽

Ayten Mustafayeva ◽

Ingrid Goebel ◽

Stephan Fuchs ◽

...

Keyword(s):

Integrated Approach ◽

Structural Features ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Data Sets ◽

Physiological Processes ◽

Small Proteins ◽

Prokaryotic Genomes ◽

Modular Algorithm ◽

Reading Frames

ABSTRACTEmerging evidence places small proteins (≤ 50 amino acids) more centrally in physiological processes. Yet, the identification of functional small proteins and the systematic genome annotation of their cognate small open reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. Yet, they have difficulties evaluating prokaryotic genomes due to the unique architecture of prokaryotic genomes (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present our new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting smORFs. The unique feature of smORFer is that it uses integrated approach and considers structural features of the genetic sequence along with in-register translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way and dependent on the data available for a particular organism allows using different modules for smORF search.

Download Full-text

ORFLine: a bioinformatic pipeline to prioritise small open reading frames identifies candidate secreted small proteins from lymphocytes

10.1101/2021.01.21.426789 ◽

2021 ◽

Author(s):

Fengyuan Hu ◽

Jia Lu ◽

Manuel D. Munoz ◽

Alexander Saveliev ◽

Martin Turner

Keyword(s):

T Lymphocytes ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Computational Pipeline ◽

Bioinformatic Pipeline ◽

Small Proteins ◽

Show Evidence ◽

Reading Frames ◽

B And T Lymphocytes ◽

Small Open Reading Frames

AbstractThe annotation of small open reading frames (smORFs) of less than 100 codons (<300 nucleotides) is challenging due to the large number of such sequences in the genome. The recent development of next generation sequence and ribosome profiling enables identification of actively translated smORFs. In this study, we developed a computational pipeline, which we have named ORFLine, that stringently identifies smORFs and classifies them according to their position within transcripts. We identified a total of 5744 unique smORFs in datasets from mouse B and T lymphocytes and systematically characterized them using ORFLine. We further searched smORFs for the presence of a signal peptide, which predicted known secreted chemokines as well as novel micropeptides. Five novel micropeptides show evidence of secretion and are therefore candidate mediators of immunoregulatory functions.

Download Full-text

Faculty Opinions recommendation of A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1089012.542156 ◽

2007 ◽

Author(s):

José Luis Riechmann

Keyword(s):

Arabidopsis Thaliana ◽

Purifying Selection ◽

Open Reading Frames ◽

Intergenic Regions ◽

Arabidopsis Thaliana Genome ◽

Reading Frames ◽

Small Open Reading Frames

Download Full-text

Faculty Opinions recommendation of Detecting actively translated open reading frames in ribosome profiling data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726008611.793524933 ◽

2016 ◽

Author(s):

Auinash Kalsotra

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Reading Frames

Download Full-text

OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes

Nucleic Acids Research ◽

10.1093/nar/gkaa1036 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D380-D388 ◽

Cited By ~ 1

Author(s):

Marie A Brunet ◽

Jean-François Lucier ◽

Maxime Levesque ◽

Sébastien Leblanc ◽

Jean-Francois Jacques ◽

...

Keyword(s):

Confidence Score ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Supporting Evidence ◽

Initial Release ◽

Ncbi Refseq ◽

Computational Resources ◽

Analysis Platform ◽

Eukaryotic Genomes ◽

Reading Frames

Abstract OpenProt (www.openprot.org) is the first proteogenomic resource supporting a polycistronic annotation model for eukaryotic genomes. It provides a deeper annotation of open reading frames (ORFs) while mining experimental data for supporting evidence using cutting-edge algorithms. This update presents the major improvements since the initial release of OpenProt. All species support recent NCBI RefSeq and Ensembl annotations, with changes in annotations being reported in OpenProt. Using the 131 ribosome profiling datasets re-analysed by OpenProt to date, non-AUG initiation starts are reported alongside a confidence score of the initiating codon. From the 177 mass spectrometry datasets re-analysed by OpenProt to date, the unicity of the detected peptides is controlled at each implementation. Furthermore, to guide the users, detectability statistics and protein relationships (isoforms) are now reported for each protein. Finally, to foster access to deeper ORF annotation independently of one’s bioinformatics skills or computational resources, OpenProt now offers a data analysis platform. Users can submit their dataset for analysis and receive the results from the analysis by OpenProt. All data on OpenProt are freely available and downloadable for each species, the release-based format ensuring a continuous access to the data. Thus, OpenProt enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.

Download Full-text