scholarly journals Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

2015 ◽  
Author(s):  
Anil Raj ◽  
Sidney H. Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I. Li ◽  
...  

AbstractAccurate annotation of protein coding regions is essential for understanding how genetic information is translated into biological functions. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,273 previously unannotated coding sequences, including 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. The novel sequences exhibit significant signatures of selective constraint in the reading frames of the inferred proteins, suggesting that many of these are functional. Nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Our work significantly expands the set of known coding regions in humans.

eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Anil Raj ◽  
Sidney H Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I Li ◽  
...  

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.


2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 982
Author(s):  
Maksim Makarenko ◽  
Alexander Usatov ◽  
Tatiana Tatarinova ◽  
Kirill Azarin ◽  
Alexey Kovalevich ◽  
...  

The genus Helianthus is a diverse taxonomic group with approximately 50 species. Most sunflower genomic investigations are devoted to economically valuable species, e.g., H. annuus, while other Helianthus species, especially perennial, are predominantly a blind spot. In the current study, we have assembled the complete mitogenomes of two perennial species: H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). We analyzed their sequences and gene profiles in comparison to the available complete mitogenomes of H. annuus. Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when compared with each other and with annual sunflowers (H. annuus). Common mitochondrial open reading frames (ORFs) (orf117, orf139, and orf334) in sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were identified. The maintenance of plastid-derived coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mitochondrial DNA (mtDNA) coding sequences in the Helianthus genus.


2013 ◽  
Vol 79 (13) ◽  
pp. 4115-4128 ◽  
Author(s):  
Dustin Brisson ◽  
Wei Zhou ◽  
Brandon L. Jutras ◽  
Sherwood Casjens ◽  
Brian Stevenson

ABSTRACTLyme disease spirochetes possess complex genomes, consisting of a main chromosome and 20 or more smaller replicons. Among those small DNAs are the cp32 elements, a family of prophages that replicate as circular episomes. All complete cp32s contain anerplocus, which encodes surface-exposed proteins. Sequences were compared for all 193erpalleles carried by 22 different strains of Lyme disease-causing spirochete to investigate their natural diversity and evolutionary histories. These included multiple isolates from a focus where Lyme disease is endemic in the northeastern United States and isolates from across North America and Europe. Bacteria were derived from diseased humans and from vector ticks and included members of 5 differentBorreliagenospecies. Allerpoperon 5′-noncoding regions were found to be highly conserved, as were the initial 70 to 80 bp of allerpopen reading frames, traits indicative of a common evolutionary origin. However, the majority of the protein-coding regions are highly diverse, due to numerous intra- and intergenic recombination events. Mosterpalleles are chimeras derived from sequences of closely related and distantly relatederpsequences and from unknown origins. Since known functions of Erp surface proteins involve interactions with various host tissue components, this diversity may reflect both their multiple functions and the abilities of Lyme disease-causing spirochetes to successfully infect a wide variety of vertebrate host species.


Author(s):  
Tamara Ouspenskaia ◽  
Travis Law ◽  
Karl R. Clauser ◽  
Susan Klaeger ◽  
Siranush Sarkizova ◽  
...  

AbstractTumor epitopes – peptides that are presented on surface-bound MHC I proteins - provide targets for cancer immunotherapy and have been identified extensively in the annotated protein-coding regions of the genome. Motivated by the recent discovery of translated novel unannotated open reading frames (nuORFs) using ribosome profiling (Ribo-seq), we hypothesized that cancer-associated processes could generate nuORFs that can serve as a new source of tumor antigens that harbor somatic mutations or show tumor-specific expression. To identify cancer-specific nuORFs, we generated Ribo-seq profiles for 29 malignant and healthy samples, developed a sensitive analytic approach for hierarchical ORF prediction, and constructed a high-confidence database of translated nuORFs across tissues. Peptides from 3,555 unique translated nuORFs were presented on MHC I, based on analysis of an extensive dataset of MHC I-bound peptides detected by mass spectrometry, with >20-fold more nuORF peptides detected in the MHC I immunopeptidomes compared to whole proteomes. We further detected somatic mutations in nuORFs of cancer samples and identified nuORFs with tumor-specific translation in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs thus expand the pool of MHC I-presented, tumor-specific peptides, targetable by immunotherapies.


2019 ◽  
Author(s):  
N. Suhas Jagannathan ◽  
Narendra Meena ◽  
Kethaki Prathivadi Bhayankaram ◽  
Sudhakaran Prabakaran

AbstractRecent evidence has suggested that protein or protein-like products can be encoded by previously uncharacterized Open Reading Frames (ORFs) that we define as Novel Open Reading Frames (nORFs)1,2. These nORFs are present in both coding and non coding regions of the human genome and the novel proteins that they encode have increased the number and complexity of cellular proteome from bacteria to humans. It is a conundrum whether these protein or protein-like products could play any significant functional biological role. But hopes have been raised to target them for anticancer and antimicrobial therapy3,4. To infer whether these novel proteins can perform biological functions, we used computational predictions to systematically investigate whether their amino acid sequences can form ordered or disordered structures. Our results indicated that that these novel proteins have significantly higher predicted disorder structure compared to all known proteins, yet we do not find any correlation between the pathogenicity of the mutations and whether they are present in the ordered and disordered regions of these novel proteins. This study reveals that we should investigate these novel proteins more systematically as they may be important to understand complex diseases.


2016 ◽  
Vol 4 (6) ◽  
Author(s):  
Xuehua Wan ◽  
James M. Miller ◽  
Sonia J. Rowley ◽  
Shaobin Hou ◽  
Stuart P. Donachie

Luteimonas sp. strain JM171 was cultivated from mucus collected around the coral Porites lobata . The JM171 draft genome of 2,992,353 bp contains 2,672 protein-coding open reading frames, 45 tRNA coding regions, and encodes a putative globin-coupled diguanylate cyclase, Jm GReg.


2021 ◽  
Vol 9 (2) ◽  
pp. 400
Author(s):  
Taiyeebah Nuidate ◽  
Aphiwat Kuaphiriyakul ◽  
Komwit Surachat ◽  
Pimonsri Mittraparp-arthorn

Vibrio campbellii is an emerging aquaculture pathogen that causes luminous vibriosis in farmed shrimp. Although prophages in various aquaculture pathogens have been widely reported, there is still limited knowledge regarding prophages in the genome of pathogenic V. campbellii. Here, we describe the full-genome sequence of a prophage named HY01, induced from the emerging shrimp pathogen V. campbellii HY01. The phage HY01 was induced by mitomycin C and was morphologically characterized as long tailed phage. V. campbellii phage HY01 is composed of 41,772 bp of dsDNA with a G+C content of 47.45%. A total of 60 open reading frames (ORFs) were identified, of which 31 could be predicted for their biological functions. Twenty seven out of 31 predicted protein coding regions were matched with several encoded proteins of various Enterobacteriaceae, Pseudomonadaceae, Vibrionaceae, and other phages of Gram-negative bacteria. Interestingly, the comparative genome analysis revealed that the phage HY01 was only distantly related to Vibrio phage Va_PF430-3_p42 of fish pathogen V. anguillarum but differed in genomic size and gene organization. The phylogenetic tree placed the phage together with Siphoviridae family. Additionally, a survey of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) spacers revealed two matching sequences between phage HY01 genome and viral spacer sequence of Vibrio spp. The spacer results combined with the synteny results suggest that the evolution of V. campbellii phage HY01 is driven by the horizontal genetic exchange between bacterial families belonging to the class of Gammaproteobacteria.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
Xavier Messeguer ◽  
Juan Antonio Subirana ◽  
M Mar Alba

Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.


Sign in / Sign up

Export Citation Format

Share Document