scholarly journals Accurate annotation of human protein-coding small open reading frames

2019 ◽  
Vol 16 (4) ◽  
pp. 458-468 ◽  
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  
2020 ◽  
Vol 34 (S1) ◽  
pp. 1-1
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  

2018 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
M.Mar Albà

SUMMARYThe mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes. Although many of these transcripts show homology between human and mouse, only a small proportion of them have been functionally characterized. Here we use ribosome profiling data to identify translated open reading frames, as well as non-ribosomal protein-RNA interactions, in evolutionary conserved and non-conserved transcripts. We find that conserved regions are subject to significant evolutionary constraints and are enriched in translated open reading frames, as well as non-ribosomal protein-RNA interaction signatures, when compared to non-conserved regions. Translated ORFs can be divided in two classes, those encoding functional micropeptides and those that show no evidence of protein functionality. This study underscores the importance of combining evolutionary and biochemical measurements to advance in a more complete understanding of the transcriptome.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Robin-Lee Troskie ◽  
Yohaann Jafrani ◽  
Tim R. Mercer ◽  
Adam D. Ewing ◽  
Geoffrey J. Faulkner ◽  
...  

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


2019 ◽  
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  

Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs, providing an approach to select smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.


2020 ◽  
Author(s):  
Justin A. Bosch ◽  
Berrak Ugur ◽  
Israel Pichardo-Casas ◽  
Jorden Rabasco ◽  
Felipe Escobedo ◽  
...  

SummaryNaturally produced peptides (<100 amino acids) are important regulators of physiology, development, and metabolism. Recent studies have predicted that thousands of peptides may be translated from transcripts containing small open reading frames (smORFs). Here, we describe two previously uncharacterized peptides in Drosophila encoded by conserved smORFs, Sloth1 and Sloth2. These peptides are translated from the same bicistronic transcript and share sequence similarities, suggesting that they encode paralogs. We provide evidence that Sloth1/2 are highly expressed in neurons, localize to mitochondria, and form a complex. Double mutant analysis in animals and cell culture revealed that sloth1 and sloth2 are not functionally redundant, and their loss causes animal lethality, reduced neuronal function, impaired mitochondrial function, and neurodegeneration. These results suggest that phenotypic analysis of smORF genes in Drosophila can provide a wealth of information on the biological functions of this poorly characterized class of genes.


2021 ◽  
Author(s):  
Yanyi Jiang ◽  
Xiaofan Chen ◽  
Wei Zhang

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.


Sign in / Sign up

Export Citation Format

Share Document