scholarly journals A human ESC-based screen identifies a role for the translated lncRNA LINC00261 in pancreatic endocrine differentiation

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Bjoern Gaertner ◽  
Sebastiaan van Heesch ◽  
Valentin Schneider-Lunitz ◽  
Jana Felicitas Schulz ◽  
Franziska Witte ◽  
...  

Long noncoding RNAs (lncRNAs) are a heterogenous group of RNAs, which can encode small proteins. The extent to which developmentally regulated lncRNAs are translated and whether the produced microproteins are relevant for human development is unknown. Using a human embryonic stem cell (hESC)-based pancreatic differentiation system, we show that many lncRNAs in direct vicinity of lineage-determining transcription factors (TFs) are dynamically regulated, predominantly cytosolic, and highly translated. We genetically ablated ten such lncRNAs, most of them translated, and found that nine are dispensable for pancreatic endocrine cell development. However, deletion of LINC00261 diminishes insulin+ cells, in a manner independent of the nearby TF FOXA2. One-by-one disruption of each of LINC00261's open reading frames suggests that the RNA, rather than the produced microproteins, is required for endocrine development. Our work highlights extensive translation of lncRNAs during hESC pancreatic differentiation and provides a blueprint for dissection of their coding and noncoding roles.

2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


2020 ◽  
Vol 36 (6-7) ◽  
pp. 675-677
Author(s):  
Bertrand Jordan

A systematic search for non-conventional open reading frames in human DNA reveals a large number of small ORFs encoding peptides generally smaller than 100 amino-acids. These ORFs are transcribed and translated into small proteins, which are demonstrated to have functional significance by bulk CRISPR inactivation. Evidence is also found for bicistronic mRNAs including such a small ORF upstream of a canonical coding sequence. These findings add a new facet to our understanding of biological processes.


1998 ◽  
Vol 180 (17) ◽  
pp. 4693-4703 ◽  
Author(s):  
Tendai Mhlanga-Mutangadura ◽  
Gregory Morlin ◽  
Arnold L. Smith ◽  
Abraham Eisenstark ◽  
Miriam Golomb

ABSTRACT Haemophilus influenzae is a ubiquitous colonizer of the human respiratory tract and causes diseases ranging from otitis media to meningitis. Many H. influenzae isolates express pili (fimbriae), which mediate adherence to epithelial cells and facilitate colonization. The pilus gene (hif) cluster of H. influenzae type b maps between purE andpepN and resembles a pathogenicity island: it is present in invasive strains, absent from the nonpathogenic Rd strain, and flanked by direct repeats of sequence at the insertion site. To investigate the evolution and role in pathogenesis of the hif cluster, we compared the purE-pepN regions of various H. influenzae laboratory strains and clinical isolates. Unlike Rd, most strains had an insert at this site, which usually was the only chromosomal locus of hif DNA. The inserts are diverse in length and organization: among 20 strains, nine different arrangements were found. Several nontypeable isolates lack hif genes but have two conserved open reading frames (hicA andhicB) upstream of purE; their inferred products are small proteins with no data bank homologs. Other isolates havehif genes but lack hic DNA or have combinations of hif and hic genes. By comparing these arrangements, we have reconstructed a hypothetical ancestral genotype, the extended hif cluster. The hif region of INT1, an invasive nontypeable isolate, resembles the hypothetical ancestor. We propose that a progenitor strain acquired the extended cluster by horizontal transfer and that other variants arose as deletions. The structure of the hif cluster may correlate with colonization site or pathogenicity.


2019 ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the true prevalence of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly-initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. The corresponding genes are not only intergenic, but are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the function of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


2015 ◽  
Author(s):  
Juna Carlevaro-Fita ◽  
Anisa Rahim ◽  
Roderic Guigo ◽  
Leah Vardy ◽  
Rory Johnson

The function of long noncoding RNAs (lncRNAs) depends on their location within the cell. While most studies to date have concentrated on their nuclear roles in transcriptional regulation, evidence is mounting that lncRNA also have cytoplasmic roles. Here we comprehensively map the cytoplasmic and ribosomal lncRNA population in a human cell. Three-quarters (74%) of lncRNAs are detected in the cytoplasm, the majority of which (62%) preferentially cofractionate with polyribosomes. Ribosomal lncRNA are highly expressed across tissues, under purifying evolutionary selection, and have cytoplasmic-to-nuclear ratios comparable to mRNAs and consistent across cell types. LncRNAs may be classified into three groups by their ribosomal interaction: non-ribosomal cytoplasmic lncRNAs, and those associated with either heavy or light polysomes. A number of mRNA-like features destin lncRNA for light polysomes, including capping and 5′UTR length, but not cryptic open reading frames or polyadenylation. Surprisingly, exonic retroviral sequences antagonise recruitment. In contrast, it appears that lncRNAs are recruited to heavy polysomes through basepairing to mRNAs. Finally, we show that the translation machinery actively degrades lncRNA. We propose that light polysomal lncRNAs are translationally engaged, while heavy polysomal lncRNAs are recruited indirectly. These findings point to extensive and reciprocal regulatory interactions between lncRNA and the translation machinery.


2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


2021 ◽  
Author(s):  
John Anders ◽  
Hannes Petruschke ◽  
Nico Jehmlich ◽  
Sven-Bastiaan Haange ◽  
Martin von Bergen ◽  
...  

Abstract Background: Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results: We observe that number and quality of the Peptide-to-Spectra-Matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that previously have been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence in proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions: The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular also capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration with transcriptomics data and other source of genome-level information.


2021 ◽  
Author(s):  
Yanyan Li ◽  
Honghong Zhou ◽  
Xiaomin Chen ◽  
Yu Zheng ◽  
Quan Kang ◽  
...  

Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORFs translation events or sequences, and significantly increased data volume. More components such as non-AUG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets and collected from the literature and other sources originating from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.


2020 ◽  
Author(s):  
Bjoern Gaertner ◽  
Sebastiaan van Heesch ◽  
Valentin Schneider-Lunitz ◽  
Jana Felicitas Schulz ◽  
Franziska Witte ◽  
...  

AbstractLong noncoding RNAs (lncRNAs) are a heterogenous group of RNAs, which can encode small proteins. The extent to which developmentally regulated lncRNAs are translated and whether the produced microproteins are relevant for human development is unknown. Here, we show that many lncRNAs in direct vicinity of lineage-determining transcription factors (TFs) are dynamically regulated, predominantly cytosolic, and highly translated during pancreas development. We genetically ablated ten such lncRNAs, most of them translated, and found that nine are dispensable for endocrine cell differentiation. However, deletion of LINC00261 diminishes generation of insulin+ endocrine cells, in a manner independent of the nearby TF FOXA2. Systematic deletion of each of LINC00261’s seven poorly conserved microproteins shows that the RNA, rather than the microproteins, is required for endocrine development. Our work highlights extensive translation of lncRNAs into recently evolved microproteins during human pancreas development and provides a blueprint for dissection of their coding and noncoding roles.Graphical AbstractHighlightsExtensive lncRNA translation and microprotein production during human pancreas developmentA small-scale loss-of-function screen shows most translated lncRNAs are dispensableLINC00261 is highly translated and regulates endocrine cell differentiationDeleting LINC00261’s evolutionary young microproteins reveals no essential roles


2021 ◽  
Author(s):  
Rick Gelhausen ◽  
Teresa Müller ◽  
Sarah Svensson ◽  
Omer S. Alkhnbashi ◽  
Cynthia M. Sharma ◽  
...  

Small proteins, those encoded by open reading frames, with less than or equal to 50 codons, are emerging as an important class of cellular macromolecules in all kingdoms of life. However, they are recalcitrant to detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation of sORFs in diverse species, and this has driven the development of ORF detection tools using Ribo-seq read signals. However, only a handful of tools have been designed for bacterial data, and have not yet been systematically compared. Here, we have performed a comprehensive benchmark of ORF prediction tools which handle bacterial Ribo-seq data. For this, we created a novel Ribo-seq dataset for E. coli, and based on this plus three publicly available datasets for different bacteria, we created a benchmark set by manual labeling of translated ORFs using their Ribo-seq expression profile. This was then used to investigate the predictive performance of four Ribo-seq-based ORF detection tools we found are compatible with bacterial data (REPARATION_blast, DeepRibo, Ribo-TISH and SPECtre). The tool IRSOM was also included as a comparison for tools using coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for those inside or outside of operons. However, none of the tools was able to predict a set of recently identified, novel, experimentally-verified sORFs with high sensitivity. Overall, we find there is potential for improving the performance, applicability, usability, and reproducibility of prokaryotic ORF prediction tools that use Ribo-Seq as input.


Sign in / Sign up

Export Citation Format

Share Document