scholarly journals Translational products encoded by novel ORFs may form protein-like structures and have biological functions

2019 ◽  
Author(s):  
Chaitanya Erady ◽  
David Chong ◽  
Narendra Meena ◽  
Shraddha Puntambekar ◽  
Ruchi Chauhan ◽  
...  

AbstractTranslation products encoded by non canonical or novel open reading frame (ORF) genomic regions are generally considered too small to play any significant biological role, and dismissed as inconsequential. In this study, we show that mutations mapping to novel ORFs have significantly higher pathogenicity scores than mutations in protein-coding regions. Importantly, novel ORFs can translate into protein-like structures with putative independent biological functions that can be of relevance in disease states, including cancer. We thus provide strong evidence to support the systematic study of novel ORFs to gain new insights into normal biological and disease processes.One Sentence SummaryNon coding regions may encode protein-like products that are important to understand diseases.

2016 ◽  
Vol 2 (1) ◽  
pp. 5
Author(s):  
Yu Cuiyun ◽  
Qian Ning ◽  
Zhi-Ping Li ◽  
Wen Huang ◽  
Jia Yu ◽  
...  

<p align="left">Non-coding RNAs (ncRNA) are RNA molecules without protein coding functions owing to the lack of an open reading frame (ORF). Based on the length, ncRNAs can be divided into long and short ncRNAs; short ncRNAs include miRNAs and piRNAs. Hepatocellular carcinoma (HCC) is among the most frequent forms of cancer worldwide and its incidence is increasing rapidly. Studies have found that ncRNAs are likely to play a crucial role in a variety of biological processes including the pathogenesis and progression of HCC. In this review, we summarized the regulation mechanism and biological functions of ncRNAs in HCC with respect to its application in HCC diagnosis, therapy and prognosis.</p>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
T. M. Porter ◽  
M. Hajibabaei

Abstract Background Pseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for nuclear mitochondrial DNA segments (nuMTs) in large COI datasets. We do this by: (1) describing gene and nuMT characteristics from an artificial COI barcode dataset, (2) show the impact of two different pseudogene removal methods on perturbed community datasets with simulated nuMTs, and (3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile analysis were used to detect pseudogenes. Results Our simulations showed that it was more difficult to identify nuMTs from shorter amplicon sequences such as those typically used in metabarcoding compared with full length DNA barcodes that are used in the construction of barcode libraries. It was also more difficult to identify nuMTs in datasets where there is a high percentage of nuMTs. Existing bioinformatic pipelines used to process metabarcode sequences already remove some nuMTs, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove up to 5% of sequences even when other filtering steps are in place. Conclusions Open reading frame length filtering alone or combined with hidden Markov model profile analysis can be used to effectively screen out apparent pseudogenes from large datasets. There is more to learn from COI nuMTs such as their frequency in DNA barcoding and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI nuMTs to public databases to facilitate future studies.


Zootaxa ◽  
2020 ◽  
Vol 4748 (1) ◽  
pp. 182-194 ◽  
Author(s):  
JING ZHANG ◽  
ERNST BROCKMANN ◽  
QIAN CONG ◽  
JINHUI SHEN ◽  
NICK V. GRISHIN

We obtained whole genome shotgun sequences and phylogenetically analyzed protein-coding regions of representative skipper butterflies from the genus Carcharodus Hübner, [1819] and its close relatives. Type species of all available genus-group names were sequenced. We find that species attributed to four exclusively Old World genera (Spialia Swinhoe, 1912, Gomalia Moore, 1879, Carcharodus Hübner, [1819] and Muschampia Tutt, 1906) form a monophyletic group that we call a subtribe Carcharodina Verity, 1940. In the phylogenetic trees built from various genomic regions, these species form 7 (not 4) groups that we treat as genera. We find that Muschampia Tutt, 1906 is not monophyletic, and the 5th group is formed by currently monotypic genus Favria Tutt, 1906 new status (type species Hesperia cribrellum Eversmann, 1841), which is sister to Gomalia. The 6th and 7th groups are composed of mostly African species presently placed in Spialia. These groups do not have names and are described here as Ernsta Grishin, gen. n. (type species Pyrgus colotes Druce, 1875) and Agyllia Grishin, gen. n. (type species Pyrgus agylla Trimen, 1889). Two subgroups are recognized in Ernsta: the nominal subgenus and a new one: Delaga Grishin, subgen. n. (type species Pyrgus delagoae Trimen, 1898). Next, we observe that Carcharodus is not monophyletic, and species formerly placed in subgenera Reverdinus Ragusa, 1919 and Lavatheria Verity, 1940 are here transferred to Muschampia. Furthermore, due to differences in male genitalia or DNA sequences, we reinstate Gomalia albofasciata Moore, 1879 and Gomalia jeanneli (Picard, 1949) as species, not subspecies or synonyms of Gomalia elma (Trimen, 1862), and Spialia bifida (Higgins, 1924) as a species, not subspecies of Spialia zebra (Butler, 1888). Sequencing of the type specimens reveals 2.2-3.2% difference in COI barcodes, the evidence that combined with wing pattern differences suggests a new status of a species for Spialia lugens (Staudinger, 1886) and Spialia carnea (Reverdin, 1927), formerly subspecies of Spialia orbifer (Hübner, [1823]). 


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1324
Author(s):  
Garin Newcomb ◽  
Khalid Sayood

One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Janaina de Freitas Nascimento ◽  
Steven Kelly ◽  
Jack Sunter ◽  
Mark Carrington

Selective transcription of individual protein coding genes does not occur in trypanosomes and the cellular copy number of each mRNA must be determined post-transcriptionally. Here, we provide evidence that codon choice directs the levels of constitutively expressed mRNAs. First, a novel codon usage metric, the gene expression codon adaptation index (geCAI), was developed that maximised the relationship between codon choice and the measured abundance for a transcriptome. Second, geCAI predictions of mRNA levels were tested using differently coded GFP transgenes and were successful over a 25-fold range, similar to the variation in endogenous mRNAs. Third, translation was necessary for the accelerated mRNA turnover resulting from codon choice. Thus, in trypanosomes, the information determining the levels of most mRNAs resides in the open reading frame and translation is required to access this information.


Viruses ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1592
Author(s):  
Enikő Fehér ◽  
Szilvia Jakab ◽  
Krisztina Bali ◽  
Eszter Kaszab ◽  
Borbála Nagy ◽  
...  

Duck hepatitis A virus (DHAV), an avian picornavirus, causes high-mortality acute disease in ducklings. Among the three serotypes, DHAV-1 is globally distributed, whereas DHAV-2 and DHAV-3 serotypes are chiefly restricted to Southeast Asia. In this study, we analyzed the genomic evolution of DHAV-1 strains using extant GenBank records and genomic sequences of 10 DHAV-1 strains originating from a large disease outbreak in 2004–2005, in Hungary. Recombination analysis revealed intragenotype recombination within DHAV-1 as well as intergenotype recombination events involving DHAV-1 and DHAV-3 strains. The intergenotype recombination occurred in the VP0 region. Diversifying selection seems to act at sites of certain genomic regions. Calculations estimated slightly lower rates of evolution of DHAV-1 (mean rates for individual protein coding regions, 5.6286 × 10−4 to 1.1147 × 10−3 substitutions per site per year) compared to other picornaviruses. The observed evolutionary mechanisms indicate that whole-genome-based analysis of DHAV strains is needed to better understand the emergence of novel strains and their geographical dispersal.


Author(s):  
Matteo Chiara ◽  
David S. Horner ◽  
Carmela Gissi ◽  
Graziano Pesole

AbstractPhylogenomic analysis of SARS-CoV-2 as available from publicly available repositories suggests the presence of 3 prevalent groups of viral episomes (super-clades), which are mostly associated with outbreaks in distinct geographic locations (China, USA and Europe). While levels of genomic variability between SARS-CoV-2 isolates are limited, to our knowledge, it is not clear whether the observed patterns of variability in viral super-clades reflect ongoing adaptation of SARS-CoV-2, or merely genetic drift and founder effects. Here, we analyze more than 1100 complete, high quality SARS-CoV-2 genome sequences, and provide evidence for the absence of distinct evolutionary patterns/signatures in the genomes of the currently known major clades of SARS-CoV-2. Our analyses suggest that the presence of distinct viral episomes at different geographic locations are consistent with founder effects, coupled with the rapid spread of this novel virus. We observe that while cross species adaptation of the virus is associated with hypervariability of specific protein coding regions (including the RDB domain of the spike protein), the more variable genomic regions between extant SARS-CoV-2 episomes correspond with the 3’ and 5’ UTRs, suggesting that at present viral protein coding genes should not be subjected to different adaptive evolutionary pressures in different viral strains. Although this study can not be conclusive, we believe that the evidence presented here is strongly consistent with the notion that the biased geographic distribution of SARS-CoV-2 isolates should not be associated with adaptive evolution of this novel pathogen.


2021 ◽  
Vol 33 (2) ◽  
pp. 157-165
Author(s):  
Xuanzong Guo ◽  
Uwe Ohler ◽  
Ferah Yildirim

Abstract Genetic variants associated with human diseases are often located outside the protein coding regions of the genome. Identification and functional characterization of the regulatory elements in the non-coding genome is therefore of crucial importance for understanding the consequences of genetic variation and the mechanisms of disease. The past decade has seen rapid progress in high-throughput analysis and mapping of chromatin accessibility, looping, structure, and occupancy by transcription factors, as well as epigenetic modifications, all of which contribute to the proper execution of regulatory functions in the non-coding genome. Here, we review the current technologies for the definition and functional validation of non-coding regulatory regions in the genome.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 160 ◽  
Author(s):  
Johannes M. Dijkstra ◽  
Keith T. Ballingall

In a recent publication in Science, Wang et al. found a long noncoding RNA (lncRNA) expressed in human dendritic cells (DC), which they designated lnc-DC. Based on lentivirus-mediated RNA interference (RNAi) experiments in human and murine systems, they concluded that lnc-DC is important in differentiation of monocytes into DC. However, Wang et al. did not mention that their so-called “mouse lnc-DC ortholog” gene was already designated “Wdnm1-like” and is known to encode a small secreted protein.  We found that incapacitation of the Wdnm1-like open reading frame (ORF) is very rare among mammals, with all investigated primates except for hominids having an intact ORF. The null-hypothesis by Wang et al. therefore should have been that the human lnc-DC transcript might only represent a non-functional relatively young evolutionary remnant of a protein coding locus.  Whether this null-hypothesis can be rejected by the experimental data presented by Wang et al. depends in part on the possible off-target (immunogenic or otherwise) effects of their RNAi procedures, which were not exhaustive in regard to the number of analyzed RNAi sequences and control sequences.  If, however, the conclusions by Wang et al. on their human model are correct, and they may be, current knowledge regarding the Wdnm1-like locus suggests an intriguing combination of different functions mediated by transcript and protein in the maturation of several cell types at some point in evolution. We feel that the article by Wang et al. tends to be misleading without the discussion presented here.


Sign in / Sign up

Export Citation Format

Share Document