scholarly journals When Long Noncoding Becomes Protein Coding

2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.

2021 ◽  
Vol 72 (1) ◽  
Author(s):  
Andrzej T. Wierzbicki ◽  
Todd Blevins ◽  
Szymon Swiezewski

Plants have an extraordinary diversity of transcription machineries, including five nuclear DNA-dependent RNA polymerases. Four of these enzymes are dedicated to the production of long noncoding RNAs (lncRNAs), which are ribonucleic acids with functions independent of their protein-coding potential. lncRNAs display a broad range of lengths and structures, but they are distinct from the small RNA guides of RNA interference (RNAi) pathways. lncRNAs frequently serve as structural, catalytic, or regulatory molecules for gene expression. They can affect all elements of genes, including promoters, untranslated regions, exons, introns, and terminators, controlling gene expression at various levels, including modifying chromatin accessibility, transcription, splicing, and translation. Certain lncRNAs protect genome integrity, while others respond to environmental cues like temperature, drought, nutrients, and pathogens. In this review, we explain the challenge of defining lncRNAs, introduce the machineries responsible for their production, and organize this knowledge by viewing the functions of lncRNAs throughout the structure of a typical plant gene. Expected final online publication date for the Annual Review of Plant Biology, Volume 72 is May 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Author(s):  
Yanyi Jiang ◽  
Xiaofan Chen ◽  
Wei Zhang

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.


2004 ◽  
Vol 78 (20) ◽  
pp. 11187-11197 ◽  
Author(s):  
Lisa M. Kattenhorn ◽  
Ryan Mills ◽  
Markus Wagner ◽  
Alexandre Lomsadze ◽  
Vsevolod Makeev ◽  
...  

ABSTRACT Proteins associated with the murine cytomegalovirus (MCMV) viral particle were identified by a combined approach of proteomic and genomic methods. Purified MCMV virions were dissociated by complete denaturation and subjected to either separation by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and in-gel digestion or treated directly by in-solution tryptic digestion. Peptides were separated by nanoflow liquid chromatography and analyzed by tandem mass spectrometry (LC-MS/MS). The MS/MS spectra obtained were searched against a database of MCMV open reading frames (ORFs) predicted to be protein coding by an MCMV-specific version of the gene prediction algorithm GeneMarkS. We identified 38 proteins from the capsid, tegument, glycoprotein, replication, and immunomodulatory protein families, as well as 20 genes of unknown function. Observed irregularities in coding potential suggested possible sequence errors in the 3′-proximal ends of m20 and M31. These errors were experimentally confirmed by sequencing analysis. The MS data further indicated the presence of peptides derived from the unannotated ORFs ORFc225441-226898 (m166.5) and ORF105932-106072. Immunoblot experiments confirmed expression of m166.5 during viral infection.


2015 ◽  
Author(s):  
Juna Carlevaro-Fita ◽  
Anisa Rahim ◽  
Roderic Guigo ◽  
Leah Vardy ◽  
Rory Johnson

The function of long noncoding RNAs (lncRNAs) depends on their location within the cell. While most studies to date have concentrated on their nuclear roles in transcriptional regulation, evidence is mounting that lncRNA also have cytoplasmic roles. Here we comprehensively map the cytoplasmic and ribosomal lncRNA population in a human cell. Three-quarters (74%) of lncRNAs are detected in the cytoplasm, the majority of which (62%) preferentially cofractionate with polyribosomes. Ribosomal lncRNA are highly expressed across tissues, under purifying evolutionary selection, and have cytoplasmic-to-nuclear ratios comparable to mRNAs and consistent across cell types. LncRNAs may be classified into three groups by their ribosomal interaction: non-ribosomal cytoplasmic lncRNAs, and those associated with either heavy or light polysomes. A number of mRNA-like features destin lncRNA for light polysomes, including capping and 5′UTR length, but not cryptic open reading frames or polyadenylation. Surprisingly, exonic retroviral sequences antagonise recruitment. In contrast, it appears that lncRNAs are recruited to heavy polysomes through basepairing to mRNAs. Finally, we show that the translation machinery actively degrades lncRNA. We propose that light polysomal lncRNAs are translationally engaged, while heavy polysomal lncRNAs are recruited indirectly. These findings point to extensive and reciprocal regulatory interactions between lncRNA and the translation machinery.


2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


2019 ◽  
Author(s):  
Yaara Finkel ◽  
Dominik Schmiedel ◽  
Julie Tai-Schmiedel ◽  
Aharon Nachshon ◽  
Michal Schwartz ◽  
...  

AbstractHuman herpesvirus 6 (HHV-6) A and B are highly ubiquitous betaherpesviruses, infecting the majority of the human population. Like other herpesviruses, they encompass large genomes and our understanding of their protein coding potential is far from complete. Here we employ ribosome profiling and systematic transcript analysis to experimentally define the HHV-6 translation products and to follow their temporal expression. We identify hundreds of new open reading frames (ORFs), including many upstream ORFs (uORFs) and internal ORFs (iORFs), generating a complete unbiased atlas of HHV-6 proteome. Furthermore, by integrating systematic data from the prototypic betaherpesvirus, human cytomegalovirus, we uncover numerous uORFs and iORFs that are conserved across betaherpesviruses and we show that uORFs are specifically enriched in late viral genes. Using our transcriptome measurements, we identified three highly abundant HHV-6 encoded long non-coding RNAs (lncRNAs), one of which generates a non-polyadenylated stable intron that appears to be a conserved feature of betaherpesviruses. Overall, our work reveals the complexity of HHV-6 genomes and highlights novel features that are conserved between betaherpesviruses, providing a rich resource for future functional studies.


2017 ◽  
Vol 42 (4) ◽  
pp. 1407-1419 ◽  
Author(s):  
Zhihong Li ◽  
Pengcheng Dou ◽  
Tang Liu ◽  
Shasha He

Osteosarcoma is the most common primary bone malignancy in children and adolescents. Although improvements in therapeutic strategies were achieved, the outcome remains poor for most patients with metastatic or recurrent osteosarcoma. Therefore, it is imperative to identify novel and effective prognostic biomarker and therapeutic targets for the disease. Long noncoding RNAs (lncRNAs) are a novel class of RNA molecules defined as transcripts >200 nucleotides that lack protein coding potential. Many lncRNAs are deregulated in cancer and are important regulators for malignancies. Nine lncRNAs (91H, BCAR4, FGFR3-AS1, HIF2PUT, HOTTIP, HULC, MALAT-1, TUG1, UCA1) are upregulated and considered oncogenic for osteosarcoma. Loc285194 and MEG3 are two lncRNAs downregulated and as tumor suppressor for the disease. Moreover, the expressions of LINC00161 and ODRUL are associated with chemo-resistance of osteosarcoma. The mechanisms for these lncRNAs in regulating development of osteosarcoma are diverse, e.g. ceRNA, Wnt/β-catenin pathway, etc. The lncRNAs identified may serve as potential biomarkers or therapeutic targets for osteosarcoma.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1489-D1495 ◽  
Author(s):  
Jingjing Jin ◽  
Peng Lu ◽  
Yalong Xu ◽  
Zefeng Li ◽  
Shizhou Yu ◽  
...  

Abstract Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with little or no protein coding potential. The expanding list of lncRNAs and accumulating evidence of their functions in plants have necessitated the creation of a comprehensive database for lncRNA research. However, currently available plant lncRNA databases have some deficiencies, including the lack of lncRNA data from some model plants, uneven annotation standards, a lack of visualization for expression patterns, and the absence of epigenetic information. To overcome these problems, we upgraded our Plant Long noncoding RNA Database (PLncDB, http://plncdb.tobaccodb.org/), which was based on a uniform annotation pipeline. PLncDB V2.0 currently contains 1 246 372 lncRNAs for 80 plant species based on 13 834 RNA-Seq datasets, integrating lncRNA information from four other resources including EVLncRNAs, RNAcentral and etc. Expression patterns and epigenetic signals can be visualized using multiple tools (JBrowse, eFP Browser and EPexplorer). Targets and regulatory networks for lncRNAs are also provided for function exploration. In addition, PLncDB V2.0 is hierarchical and user-friendly and has five built-in search engines. We believe PLncDB V2.0 is useful for the plant lncRNA community and data mining studies and provides a comprehensive resource for data-driven lncRNA research in plants.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Bjoern Gaertner ◽  
Sebastiaan van Heesch ◽  
Valentin Schneider-Lunitz ◽  
Jana Felicitas Schulz ◽  
Franziska Witte ◽  
...  

Long noncoding RNAs (lncRNAs) are a heterogenous group of RNAs, which can encode small proteins. The extent to which developmentally regulated lncRNAs are translated and whether the produced microproteins are relevant for human development is unknown. Using a human embryonic stem cell (hESC)-based pancreatic differentiation system, we show that many lncRNAs in direct vicinity of lineage-determining transcription factors (TFs) are dynamically regulated, predominantly cytosolic, and highly translated. We genetically ablated ten such lncRNAs, most of them translated, and found that nine are dispensable for pancreatic endocrine cell development. However, deletion of LINC00261 diminishes insulin+ cells, in a manner independent of the nearby TF FOXA2. One-by-one disruption of each of LINC00261's open reading frames suggests that the RNA, rather than the produced microproteins, is required for endocrine development. Our work highlights extensive translation of lncRNAs during hESC pancreatic differentiation and provides a blueprint for dissection of their coding and noncoding roles.


NAR Cancer ◽  
2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Ghofran Othoum ◽  
Emily Coonrod ◽  
Sidi Zhao ◽  
Ha X Dang ◽  
Christopher A Maher

Abstract Recent studies show that annotated long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs) encode for stable, functional peptides that contribute to human development and disease. To systematically discover lncRNAs and circRNAs encoding peptides, we performed a comprehensive integrative analysis of mass spectrometry-based proteomic and transcriptomic sequencing data from >900 patients across nine cancer types. This enabled us to identify 19,871 novel peptides derived from 8,903 lncRNAs. Further, we exploited open reading frames overlapping the backspliced region of circRNAs to identify 3,238 peptides that are uniquely derived from 2,834 circRNAs and not their corresponding linear RNAs. Collectively, our pan-cancer proteogenomic analysis will serve as a resource for evaluating the coding potential of lncRNAs and circRNAs that could aid future mechanistic studies exploring their function in cancer.


Sign in / Sign up

Export Citation Format

Share Document