scholarly journals Biological factors in the synthetic construction of overlapping genes

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Stefan Wichmann ◽  
Siegfried Scherer ◽  
Zachary Ardern

Abstract Background Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life’s ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. Results After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. Conclusions Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.

2021 ◽  
Author(s):  
Qi Wang ◽  
Na Liu

Abstract In response to Enterococcus faecalis infection of chicken origin, a multi host lytic phage, EFC1 was isolated and characterized the double-stranded circular DNA genome with size of 56099 bp, containing 89 predicted protein coding genes as well as 2 tRNAs involved in intron, structure, transcription, packaging, DNA replication, modification, lysis. Observation of the structure by electron microscopy and comparative phylogenetic analysis of terminase large subunit showed that the phage EFC1 belongs to a new member of Siphoviridae, which is relatively distantly related to its high similarity phages. The phage EFC1 has no relevant virulence genes and antibiotic resistance genes.


2021 ◽  
Author(s):  
Lucas Felipe Moreira Silva ◽  
Renato Bulcão-Neto

Environmental Health (EH) refers to aspects of human health affectedby factors in the environment, e.g., biological factors, andit is an essential part of any comprehensive public health system.Similar to other health-related fields, one observes an increasingmovement in the adoption of IoT technologies into the EH domain.Regarding the data life cycle in IoT systems, data modeling andinterpretation are crucial tasks in which ontologies are a feasiblesolution because of their expressiveness and reasoning support.In this paper, we structure the ontology-supported EH researchtheme through a systematic literature mapping. The identificationand selection strategies of primary studies include the automaticsearch for studies published from 2010 to 2019 on five sourcesand the application of inclusion and exclusion criteria on an eighthundred-eleven-distinct-paper group. The results of this originalwork provide an overview of the research theme with multipleclassifications of thirty-four relevant studies remaining as well asthe finding of trends and gaps for future work.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2005 ◽  
Vol 79 (12) ◽  
pp. 7570-7596 ◽  
Author(s):  
Luciano Brocchieri ◽  
Thomas N. Kledal ◽  
Samuel Karlin ◽  
Edward S. Mocarski

ABSTRACT Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis.


Plants ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1605
Author(s):  
Xiaofeng Chi ◽  
Faqi Zhang ◽  
Qi Dong ◽  
Shilong Chen

Biebersteiniaceae and Nitrariaceae, two small families, were classified in Sapindales recently. Taxonomic and phylogenetic relationships within Sapindales are still poorly resolved and controversial. In current study, we compared the chloroplast genomes of five species (Biebersteinia heterostemon, Peganum harmala, Nitraria roborowskii, Nitraria sibirica, and Nitraria tangutorum) from Biebersteiniaceae and Nitrariaceae. High similarity was detected in the gene order, content and orientation of the five chloroplast genomes; 13 highly variable regions were identified among the five species. An accelerated substitution rate was found in the protein-coding genes, especially clpP. The effective number of codons (ENC), parity rule 2 (PR2), and neutrality plots together revealed that the codon usage bias is affected by mutation and selection. The phylogenetic analysis strongly supported (Nitrariaceae (Biebersteiniaceae + The Rest)) relationships in Sapindales. Our findings can provide useful information for analyzing phylogeny and molecular evolution within Biebersteiniaceae and Nitrariaceae.


2019 ◽  
Author(s):  
Denis Moshensky ◽  
Andrei Alexeevski

AbstractThe origin and evolution of genes that have common base pairs (overlapping genes) are of particular interest due to their influencing each other. Especially intriguing are gene pairs with long overlaps. In prokaryotes, co-directional overlaps longer than 60 bp were shown to be nonexistent except for some instances. A few antiparallel prokaryotic genes with long overlaps were described in the literature. We have analyzed putative long antiparallel overlapping genes to determine whether open reading frames (ORFs) located opposite to genes (antiparallel ORFs) can be protein-coding genes.We have confirmed that long antiparallel ORFs (AORFs) are observed reliably to be more frequent than expected. There are 10 472 000 AORFs in 929 analyzed genomes with overlap length more than 180 bp. Stop codons on the opposite to the coding strand are avoided in 2 898 cases with Benjamini-Hochberg threshold 0.01.Using Ka/Ks ratio calculations, we have revealed that long AORFs do not affect the type of selection acting on genes in a vast majority of cases. This observation indicates that long AORFs translations commonly are not under negative selection.The demonstrative example is 282 longer than 1 800 bp AORFs found opposite to extremely conserved dnaK genes. Translations of these AORFs were annotated “glutamate dehydrogenases” and were included into Pfam database as third protein family of glutamate dehydrogenases, PF10712. Ka/Ks analysis has demonstrated that if these translations correspond to proteins, they are not subjected by negative selection while dnaK genes are under strong stabilizing selection. Moreover, we have found other arguments against the hypothesis that these AORFs encode essential proteins, proteins indispensable for cellular machinery.However, some AORFs, in particular, dnaK related, have been found slightly resisting to synonymous changes in genes. It indicates the possibility of their translation. We speculate that translations of certain AORFs might have a functional role other than encoding essential proteins.Essential genes are unlikely to be encoded by AORFs in prokaryotic genomes. Nevertheless, some AORFs might have biological significance associated with their translations.Author summaryGenes that have common base pairs are called overlapping genes. We have examined the most intriguing case: if gene pairs encoded on opposite DNA strands exist in prokaryotes. An intersection length threshold 180 bp has been used. A few such pairs of genes were experimentally confirmed.We have detected all long antiparallel ORFs in 929 prokaryotic genomes and have found that the number of open reading frames, located opposite to annotated genes, is much more than expected according to statistical model. We have developed a measure of stop codon avoidance on the opposite strand. The lengths of found antiparallel ORFs with stop codon avoidance are typical for prokaryotic genes.Comparative genomics analysis shows that long antiparallel ORFs (AORFs) are unlikely to be essential protein-coding genes. We have analyzed distributions of features typical for essential proteins among formal translations of all long AORFs: prevalence of negative selection, non-uniformity of a conserved positions distribution in a multiple alignment of homologous proteins, the character of homologs distribution in phylogenetic tree of prokaryotes. All of them have not been observed for the majority of long AORFs. Particularly, the same results have been obtained for some experimentally confirmed AOGs.Thus, pairs of antiparallel overlapping essential genes are unlikely to exist. On the other hand, some antiparallel ORFs affect the evolution of genes opposite that they are located. Consequently, translations of some antiparallel ORFs might have yet unknown biological significance.


Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4195
Author(s):  
Calvin Janitra Halim ◽  
Kazuhiko Kawamoto

Recent approaches to time series forecasting, especially forecasting spatiotemporal sequences, have leveraged the approximation power of deep neural networks to model the complexity of such sequences, specifically approaches that are based on recurrent neural networks. Still, as spatiotemporal sequences that arise in the real world are noisy and chaotic, modeling approaches that utilize probabilistic temporal models, such as deep Markov models (DMMs), are favorable because of their ability to model uncertainty, increasing their robustness to noise. However, approaches based on DMMs do not maintain the spatial characteristics of spatiotemporal sequences, with most of the approaches converting the observed input into 1D data halfway through the model. To solve this, we propose a model that retains the spatial aspect of the target sequence with a DMM that consists of 2D convolutional neural networks. We then show the robustness of our method to data with large variance compared with naive forecast, vanilla DMM, and convolutional long short-term memory (LSTM) using synthetic data, even outperforming the DNN models over a longer forecast period. We also point out the limitations of our model when forecasting real-world precipitation data and the possible future work that can be done to address these limitations, along with additional future research potential.


Plants ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 979
Author(s):  
Millicent Akinyi Oulo ◽  
Jia-Xin Yang ◽  
Xiang Dong ◽  
Vincent Okelo Wanga ◽  
Elijah Mbandi Mkala ◽  
...  

Rhipsalis baccifera is the only cactus that naturally occurs in both the New World and the Old World, and has thus drawn the attention of most researchers. The complete chloroplast (cp) genome of R. baccifera is reported here for the first time. The cp genome of R. baccifera has 122, 333 base pairs (bp), with a large single-copy (LSC) region (81,459 bp), SSC (23,531 bp) and two inverted repeat (IR) regions each 8530 bp. The genome contains 110 genes, with 73 protein-coding genes, 31 tRNAs, 4 rRNAs and 2 pseudogenes. Twelve genes have introns, with loss of introns being observed in, rpoc1clpP and rps12 genes. 49 repeat sequences and 62 simple sequence repeats (SSRs) were found in the genome. Comparative analysis with eight species of the ACPT (Anacampserotaceae, Cactaceae, Portulacaceae, and Talinaceae) clade of the suborder Portulacineae species, showed that R. baccifera genome has higher number of rearrangements, with a 19 gene inversion in its LSC region representing the most significant structural change in terms of its size. Inversion of the SSC region seems common in subfamily Cactoideae, and another 6 kb gene inversion between rbcL- trnM was observed in R. baccifera and Carnegiea gigantea. The IRs of R. baccifera are contracted. The phylogenetic analysis among 36 complete chloroplast genomes of Caryophyllales species and two outgroup species supported monophyly of the families of the ACPT clade. R. baccifera occupied a basal position of the family Cactaceae clade in the tree. A high number of rearrangements in this cp genome suggests a larger number mutation events in the history of evolution of R. baccifera. These results provide important tools for future work on R. baccifera and in the evolutionary studies of the suborder Portulacineae.


Genetics ◽  
1997 ◽  
Vol 145 (3) ◽  
pp. 749-758 ◽  
Author(s):  
Nika Yamazaki ◽  
Rei Ueshima ◽  
Jonathan A Terrett ◽  
Shin-ichi Yokobori ◽  
Masayuki Kaifu ◽  
...  

Complete gene organizations of the mitochondrial genomes of three pulmonate gastropods, Euhadra herklotsi, Cepaea nemoralis and Albinaria coerulea, permit comparisons of their gene organizations. Euhadra and Cepaea are classified in the same superfamily, Helicoidea, yet they show several differences in the order of tRNA and protein coding genes. Albinaria is distantly related to the other two genera but shares the same gene order in one part of its mitochondrial genome with Euhadra and in another part with Cepaea. Despite their small size (14.1 – 14.5 kbp), these snail mtDNAs encode 13 protein genes, two rRNA genes and at least 22 tRNA genes. These genomes exhibit several unusual or unique features compared to other published metazoan mitochondrial genomes, including those of other molluscs. Several tRNAs predicted from the DNA sequences possess bizarre structures lacking either the T stem or the D stem, similar to the situation seen in nematode mt-tRNAs. The acceptor stems of many tRNAs show a considerable number of mismatched basepairs, indicating that the RNA editing process recently demonstrated in Euhadra is widespread in the pulmonate gastropods. Strong selection acting on mitochondrial genomes of these animals would have resulted in frequent occurrence of the mismatched basepairs in regions of overlapping genes.


2019 ◽  
Author(s):  
Barbara Zehentner ◽  
Zachary Ardern ◽  
Michaela Kreitmeier ◽  
Siegfried Scherer ◽  
Klaus Neuhaus

AbstractAntisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation. Here we report an exceptional 603 bp long open reading frame completely embedded in antisense to the gene of the outer membrane protein ompA. Ribosomal profiling revealed translation of the mRNA and the protein was detected in Western blots. A σ70 promoter, transcription start site, Shine-Dalgarno motif and rho-independent terminator were experimentally validated. A pH-dependent phenotype conferred by the protein was shown in competitive overexpression growth experiments of a translationally arrested mutant versus wild type. We designate this novel gene pop (pH-regulated overlapping protein-coding gene). Increasing evidence based on ribosome-profiling indicates translation of antisense RNA, suggesting that more overlapping genes of unknown function may exist in bacteria.


Sign in / Sign up

Export Citation Format

Share Document