scholarly journals Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides

2021 ◽  
Vol 12 ◽  
Author(s):  
Igor Fijalkowski ◽  
Marlies K. R. Peeters ◽  
Petra Van Damme

With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.

2021 ◽  
Author(s):  
Yanyan Li ◽  
Honghong Zhou ◽  
Xiaomin Chen ◽  
Yu Zheng ◽  
Quan Kang ◽  
...  

Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORFs translation events or sequences, and significantly increased data volume. More components such as non-AUG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets and collected from the literature and other sources originating from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.


2021 ◽  
Author(s):  
Fengyuan Hu ◽  
Jia Lu ◽  
Manuel D. Munoz ◽  
Alexander Saveliev ◽  
Martin Turner

AbstractThe annotation of small open reading frames (smORFs) of less than 100 codons (<300 nucleotides) is challenging due to the large number of such sequences in the genome. The recent development of next generation sequence and ribosome profiling enables identification of actively translated smORFs. In this study, we developed a computational pipeline, which we have named ORFLine, that stringently identifies smORFs and classifies them according to their position within transcripts. We identified a total of 5744 unique smORFs in datasets from mouse B and T lymphocytes and systematically characterized them using ORFLine. We further searched smORFs for the presence of a signal peptide, which predicted known secreted chemokines as well as novel micropeptides. Five novel micropeptides show evidence of secretion and are therefore candidate mediators of immunoregulatory functions.


2019 ◽  
Vol 412 (2) ◽  
pp. 449-462 ◽  
Author(s):  
Dania Awad ◽  
Thomas Brueck

AbstractIn the last decades, microbial oils have been extensively investigated as a renewable platform for biofuel and oleochemical production. Offering a potent alternative to plant-based oils, oleaginous microorganisms have been the target of ongoing metabolic engineering aimed at increasing growth and lipid yields, in addition to specialty fatty acids. Discovery proteomics is an attractive tool for elucidating lipogenesis and identifying metabolic bottlenecks, feedback regulation, and competing biosynthetic pathways. One prominent microbial oil producer is Cutaneotrichosporon oleaginosus, due to its broad feedstock catabolism and high lipid yield. However, this yeast has a recalcitrant cell wall and high cell lipid content, which complicates efficient and unbiased protein extraction for downstream proteomic analysis. Optimization efforts of protein sample preparation from C. oleaginosus in the present study encompasses the comparison of 8 lysis methods, 13 extraction buffers, and 17 purification methods with respect to protein abundance, proteome coverage, applicability, and physiochemical properties (pI, MW, hydrophobicity in addition to COG, and GO analysis). The optimized protocol presented in this work entails a one-step extraction method utilizing an optimal lysis method (liquid homogenization), which is augmented with a superior extraction buffer (50 mM Tris, 8/2 M Urea/Thiourea, and 1% C7BzO), followed by either of 2 advantageous purification methods (hexane/ethanol or TCA/acetone), depending on subsequent applications and target studies. This work presents a significant step forward towards implementation of efficient C. oleaginosus proteome mining for the identification of potential targets for genetic optimization of this yeast to improve lipogenesis and production of specialty lipids.


2019 ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the true prevalence of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly-initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. The corresponding genes are not only intergenic, but are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the function of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.


Life ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 701
Author(s):  
Bo Song ◽  
Mengyun Jiang ◽  
Lei Gao

Ribo-seq, also known as ribosome profiling, refers to the sequencing of ribosome-protected mRNA fragments (RPFs). This technique has greatly advanced our understanding of translation and facilitated the identification of novel open reading frames (ORFs) within untranslated regions or non-coding sequences as well as the identification of non-canonical start codons. However, the widespread application of Ribo-seq has been hindered because obtaining periodic RPFs requires a highly optimized protocol, which may be difficult to achieve, particularly in non-model organisms. Furthermore, the periodic RPFs are too short (28 nt) for accurate mapping to polyploid genomes, but longer RPFs are usually produced with a compromise in periodicity. Here we present RiboNT, a noise-tolerant ORF predictor that can utilize RPFs with poor periodicity. It evaluates RPF periodicity and automatically weighs the support from RPFs and codon usage before combining their contributions to identify translated ORFs. The results demonstrate the utility of RiboNT for identifying both long and small ORFs using RPFs with either good or poor periodicity. We implemented the pipeline on a dataset of RPFs with poor periodicity derived from membrane-bound polysomes of Arabidopsis thaliana seedlings and identified several small ORFs (sORFs) evolutionarily conserved in diverse plant species. RiboNT should greatly broaden the application of Ribo-seq by minimizing the requirement of RPF quality and allowing the use of longer RPFs, which is critical for organisms with complex genomes because these RPFs can be more accurately mapped to the position from which they were derived.


2021 ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Kate M. Duggan ◽  
Aoife McLysaght

We now have a growing understanding that functional short proteins can be translated out of small Open Reading Frames (sORF). Such ″microproteins″ can perform crucial biological tasks and can have considerable phenotypic consequences. However, their size makes them less amenable to genomic analysis, and their evolutionary origins and conservation are poorly understood. Given their short length it is plausible that some of these functional microproteins have recently originated entirely de novo from non-coding sequence. Here we test the possibility that de novo gene birth can produce microproteins that are functional ″out-of-the-box″. We reconstructed the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the appearance of each ORF and its transcriptional activation, we were able to show that, indeed, novel small proteins with significant phenotypic effects have emerged de novo throughout animal evolution, including many after the human-chimpanzee split. We show that traditional methods for assessing the coding potential of such sequences often fall short, due to the high variability present in the alignments and the absence of telltale evolutionary signatures that are not yet measurable. Thus we provide evidence that the functional potential intrinsic to sORFs can be rapidly, and frequently realised through de novo gene birth.


2015 ◽  
Author(s):  
Robin C Friedman ◽  
Stefan Kalkhof ◽  
Olivia Doppelt-Azeroual ◽  
Stephan Mueller ◽  
Martina Chovancova ◽  
...  

While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in phylogenetically diverse bacteria. A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 188 ± 25.5 unannotated sRNA ORFs are under selection to maintain coding, an average of 13 per species considered here. This implies that overall at least 7.5 ± 0.3% of sRNAs have a coding ORF, and in some species at least 20% do. 84 ± 9.8 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated according to ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and two S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are novel components of type I toxin/antitoxin systems. Our predictions for sRNA coding ORFs, including novel type I toxins, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr.


2021 ◽  
Author(s):  
Anne M Stringer ◽  
Carol Smith ◽  
Kyle Mangano ◽  
Joseph Thomas Wade

Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons for ~400 novel putative ORFs with high sensitivity. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs.


2021 ◽  
Author(s):  
Darius Sargautis ◽  
◽  
Tatjana Kince ◽  
Vanda Sargautiene ◽  

Oat protein itself, as a substance, has extensively been studied providing information on its nutritional value, some functional properties and possible applicability in food systems. Chosen protein isolation methods and technological aspects define final composition of obtained oat protein product, its concentration, nutrition value and its functionality in food industry. Scientific data on oat protein recovery methods, typically relying on protein solubility or dry fractionation, provides an insufficient knowledge about the success in commercialization of oat protein recovery technologies and their derivatives in form of oat protein. The aim of the study was to analyse and summarize the research findings on oat protein extraction methods and functional properties of oat protein. Semi-systematic, monographic methods were used to analyse the oat protein isolation techniques, functional properties of oat protein in aqueous food systems, covering the latest information on oat protein extraction methods. Wet and dry isolation methods were demonstrated as main methods in oat protein extraction. Functional properties of oat protein, such as thermal stability, solubility, emulsification, water hydration capacity and foaming were reviewed and evaluated, identifying limitations and protein alterations which occur through the oat protein extraction process. The study provides recent trends in oat protein recovery technologies, along with an overview of current and potential oat protein utilization in food systems.


Sign in / Sign up

Export Citation Format

Share Document