scholarly journals Long antiparallel open reading frames are unlikely to be encoding essential proteins in prokaryotic genomes

2019 ◽  
Author(s):  
Denis Moshensky ◽  
Andrei Alexeevski

AbstractThe origin and evolution of genes that have common base pairs (overlapping genes) are of particular interest due to their influencing each other. Especially intriguing are gene pairs with long overlaps. In prokaryotes, co-directional overlaps longer than 60 bp were shown to be nonexistent except for some instances. A few antiparallel prokaryotic genes with long overlaps were described in the literature. We have analyzed putative long antiparallel overlapping genes to determine whether open reading frames (ORFs) located opposite to genes (antiparallel ORFs) can be protein-coding genes.We have confirmed that long antiparallel ORFs (AORFs) are observed reliably to be more frequent than expected. There are 10 472 000 AORFs in 929 analyzed genomes with overlap length more than 180 bp. Stop codons on the opposite to the coding strand are avoided in 2 898 cases with Benjamini-Hochberg threshold 0.01.Using Ka/Ks ratio calculations, we have revealed that long AORFs do not affect the type of selection acting on genes in a vast majority of cases. This observation indicates that long AORFs translations commonly are not under negative selection.The demonstrative example is 282 longer than 1 800 bp AORFs found opposite to extremely conserved dnaK genes. Translations of these AORFs were annotated “glutamate dehydrogenases” and were included into Pfam database as third protein family of glutamate dehydrogenases, PF10712. Ka/Ks analysis has demonstrated that if these translations correspond to proteins, they are not subjected by negative selection while dnaK genes are under strong stabilizing selection. Moreover, we have found other arguments against the hypothesis that these AORFs encode essential proteins, proteins indispensable for cellular machinery.However, some AORFs, in particular, dnaK related, have been found slightly resisting to synonymous changes in genes. It indicates the possibility of their translation. We speculate that translations of certain AORFs might have a functional role other than encoding essential proteins.Essential genes are unlikely to be encoded by AORFs in prokaryotic genomes. Nevertheless, some AORFs might have biological significance associated with their translations.Author summaryGenes that have common base pairs are called overlapping genes. We have examined the most intriguing case: if gene pairs encoded on opposite DNA strands exist in prokaryotes. An intersection length threshold 180 bp has been used. A few such pairs of genes were experimentally confirmed.We have detected all long antiparallel ORFs in 929 prokaryotic genomes and have found that the number of open reading frames, located opposite to annotated genes, is much more than expected according to statistical model. We have developed a measure of stop codon avoidance on the opposite strand. The lengths of found antiparallel ORFs with stop codon avoidance are typical for prokaryotic genes.Comparative genomics analysis shows that long antiparallel ORFs (AORFs) are unlikely to be essential protein-coding genes. We have analyzed distributions of features typical for essential proteins among formal translations of all long AORFs: prevalence of negative selection, non-uniformity of a conserved positions distribution in a multiple alignment of homologous proteins, the character of homologs distribution in phylogenetic tree of prokaryotes. All of them have not been observed for the majority of long AORFs. Particularly, the same results have been obtained for some experimentally confirmed AOGs.Thus, pairs of antiparallel overlapping essential genes are unlikely to exist. On the other hand, some antiparallel ORFs affect the evolution of genes opposite that they are located. Consequently, translations of some antiparallel ORFs might have yet unknown biological significance.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Robin-Lee Troskie ◽  
Yohaann Jafrani ◽  
Tim R. Mercer ◽  
Adam D. Ewing ◽  
Geoffrey J. Faulkner ◽  
...  

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.


Genetics ◽  
2001 ◽  
Vol 159 (3) ◽  
pp. 1089-1102
Author(s):  
James C Badciong ◽  
Jeffery M Otto ◽  
Gail L Waring

Abstract The Drosophila dec-1 gene encodes multiple proteins that are required for female fertility and proper eggshell morphogenesis. Genetic and immunolocalization data suggest that the different DEC-1 proteins are functionally distinct. To identify regions within the proteins with potential biological significance, we cloned and sequenced the D. yakuba and D. virilis dec-1 homologs. Interspecies comparisons of the predicted translation products revealed rapidly evolving sequences punctuated by blocks of conserved amino acids. Despite extensive amino acid variability, the proteins produced by the different dec-1 homologs were functionally interchangeable. The introduction of transgenes containing either the D. yakuba or the D. virilis dec-1 open reading frames into a D. melanogaster DEC-1 protein null mutant was sufficient to restore female fertility and wild-type eggshell morphology. Normal expression and extracellular processing of the DEC-1 proteins was correlated with the phenotypic rescue. The nature of the conserved features highlighted by the evolutionary comparison and the molecular resemblance of some of these features to those found in other extracellular proteins suggests functional correlates for some of the multiple DEC-1 derivatives.


2009 ◽  
Vol 2009 ◽  
pp. 1-10 ◽  
Author(s):  
Daniela Lepka ◽  
Tobias Kerrinnes ◽  
Evelyn Skiebe ◽  
Birgitt Hahn ◽  
Angelika Fruth ◽  
...  

We report the nucleotide sequence of two novel cryptic plasmids (4357 and 14 662 base pairs) carried by aYersinia enterocoliticabiotype 1A strain isolated from pork. As distinguished from most biotype 1A strains, this isolate, designated 07-04449, exhibited adherence to eukaryotic cells. The smaller plasmid pYe4449-1 carries five attributable open reading frames (ORFs) encoding the first CcdA/CcdB-like antitoxin/toxin system described for aYersiniaplasmid, a RepA-like replication initiation protein, and mobilizing factors MobA and MobC. The deduced amino acid sequences showed highest similarity to proteins described inSalmonella(CcdA/B),Klebsiella(RepA), andPlesiomonas(MobA/C) indicating genomic fluidity among members of theEnterobacteriaceae. One additional ORF with unknown function, termed ORF5, was identified with an ancestry distinct from the rest of the plasmid. While the C+G content of ORF5 is 38.3%, the rest of pYe4449-1 shows a C+G content of 55.7%. The C+G content of the larger plasmid pYe4449-2 (54.9%) was similar to that of pYe4449-1 (53.7%) and differed from that of theY. enterocoliticagenome (47.3%). Of the 14 ORFs identified on pYe4449-2, only six ORFs showed significant similarity to database entries. For three of these ORFs likely functions could be ascribed: a TnpR-like resolvase and a phage replication protein, localized each on a low C+G island, and DNA primase TraC. Two ORFs of pYe4449-2, ORF3 and ORF7, seem to encode secretable proteins. Epitope-tagging of ORF3 revealed protein expression at4°Cbut not at or above27°Csuggesting adaptation to a habitat outside swine. The hypothetical protein encoded by ORF7 is the member of a novel repeat protein family sharing theDxxGN(x)nDxxGNmotif. Our findings illustrate the exceptional gene pool diversity within the speciesY. enterocoliticadriven by horizontal gene transfer events.


2021 ◽  
Author(s):  
Yanyi Jiang ◽  
Xiaofan Chen ◽  
Wei Zhang

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.


1990 ◽  
Vol 10 (9) ◽  
pp. 4795-4806
Author(s):  
J W Xuan ◽  
P Fournier ◽  
N Declerck ◽  
M Chasles ◽  
C Gaillardin

Mutants affected at the LYS5 locus of Yarrowia lipolytica lack detectable dehydrogenase (SDH) activity. The LYS5 gene has previously been cloned, and we present here the sequence of the 2.5-kilobase-pair (kb) DNA fragment complementing the lys5 mutation. Two large antiparallel open reading frames (ORF1 and ORF2) were observed, flanked by potential transcription signals. Both ORFs appear to be transcribed, but several lines of evidence suggest that only ORF2 is translated and encodes SDH. (i) The global amino acid compositions of Saccharomyces cerevisiae SDH and of the putative ORF2 product are similar and that of ORF1 is dissimilar. (ii) An in-frame translational fusion of ORF2 with the Escherichia coli lacZ gene was introduced into yeast cells and resulted in a beta-galactosidase activity regulated similarly to SDH; no beta-galactosidase activity was obtained with an in-frame fusion of ORF1 with lacZ. (iii) The introduction of a stop codon at the beginning of ORF2 prevented SDH expression in yeast cells, whereas no phenotypic effect was observed when ORF1 translation was blocked.


2020 ◽  
Vol 6 (21) ◽  
pp. eaaz2059 ◽  
Author(s):  
Liman Niu ◽  
Fangzhou Lou ◽  
Yang Sun ◽  
Libo Sun ◽  
Xiaojie Cai ◽  
...  

Many annotated long noncoding RNAs (lncRNAs) harbor predicted short open reading frames (sORFs), but the coding capacities of these sORFs and the functions of the resulting micropeptides remain elusive. Here, we report that human lncRNA MIR155HG encodes a 17–amino acid micropeptide, which we termed miPEP155 (P155). MIR155HG is highly expressed by inflamed antigen-presenting cells, leading to the discovery that P155 interacts with the adenosine 5′-triphosphate binding domain of heat shock cognate protein 70 (HSC70), a chaperone required for antigen trafficking and presentation in dendritic cells (DCs). P155 modulates major histocompatibility complex class II–mediated antigen presentation and T cell priming by disrupting the HSC70-HSP90 machinery. Exogenously injected P155 improves two classical mouse models of DC-driven auto inflammation. Collectively, we demonstrate the endogenous existence of a micropeptide encoded by a transcript annotated as “non-protein coding” and characterize a micropeptide as a regulator of antigen presentation and a suppressor of inflammatory diseases.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


1994 ◽  
Vol 14 (7) ◽  
pp. 4485-4492 ◽  
Author(s):  
B A Dombroski ◽  
Q Feng ◽  
S L Mathias ◽  
D M Sassaman ◽  
A F Scott ◽  
...  

L1 elements constitute a highly repetitive human DNA family (50,000 to 100,000 copies) lacking long terminal repeats and ending in a poly(A) tail. Some L1 elements are capable of retrotransposition in the human genome (Kazazian, H. H., Jr., C. Wong, H. Youssoufian, A. F. Scott, D. G. Phillips, and S.E. Antonarakis, Nature (London) 332:164-166, 1988). Although most are 5' truncated, a consensus sequence of complete L1 elements is 6 kb long and contains two open reading frames (ORFs) (Scott, A. F., B. J. Schmeckpeper, M. Abdelrazik, C. T. Comey, B. O'Hara, J. P. Rossiter, T. Cooley, P. Health, K. D. Smith, and L. Margolet, Genomics 1:113-125, 1987). The protein encoded by ORF2 has reverse transcriptase (RT) activity in vitro (Mathias, S. L., A. F. Scott, H. H. Kazazian, Jr., J. D. Boeke, and A. Gabriel, Science 254:1808-1810, 1991). Because L1 elements are so numerous, efficient methods for identifying active copies are required. We have developed a simple in vivo assay for the activity of L1 RT based on the system developed by Derr et al. (Derr, L. K., J. N. Strathern, and D. J. Garfinkel, Cell 67:355-364, 1991) for yeast HIS3 pseudogene formation. L1 ORF2 displays an in vivo RT activity similar to that of yeast Ty1 RT in this system and generates pseudogenes with unusual structures. Like the HIS3 pseudogenes whose formation depends on Ty1 RT, the HIS3 pseudogenes generated by L1 RT are joined to Ty1 sequences and often are part of complex arrays of Ty1 elements, multiple HIS3 pseudogenes, and hybrid Ty1/L1 elements. These pseudogenes differ from those previously described in that there are base pairs of unknown origin inserted at several of the junctions. In two of three HIS3 pseudogenes studied, the L1 RT appears to have jumped from the 5' end of a Ty1/L1 transcript to the poly(A) tract of the HIS3 RNA.


2004 ◽  
Vol 186 (20) ◽  
pp. 6714-6720 ◽  
Author(s):  
Christopher D. Herring ◽  
Frederick R. Blattner

ABSTRACT Expression of an amber suppressor tRNA should result in read-through of the 326 open reading frames (ORFs) that terminate with amber stop codons in the Escherichia coli genome, including six pseudogenes. Abnormal extension of an ORF might alter the activities of the protein and have effects on cellular physiology, while suppression of a pseudogene could lead to a gain of function. We used oligonucleotide microarrays to determine if any effects were apparent at the level of transcription in glucose minimal medium. Surprisingly, only eight genes had significantly different expression in the presence of the suppressor. Among these were the genes yaiN, adhC, and yaiM, forming a single putative operon whose likely function is the degradation of formaldehyde. Expression of wild-type yaiN was shown to result in repression of the operon, while a suppression-mimicking allele lacking the amber stop codon and extended 7 amino acids did not. The operon was shown to be induced by formaldehyde, and the genes have been renamed frmR, frmA, and frmB, respectively.


Sign in / Sign up

Export Citation Format

Share Document