prokaryotic genomes Latest Research Papers

StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes

Frontiers in Bioinformatics ◽

10.3389/fbinf.2021.704157 ◽

2021 ◽

Vol 1 ◽

Author(s):

Karl Gemayel ◽

Alexandre Lomsadze ◽

Mark Borodovsky

Keyword(s):

Ab Initio ◽

State Of The Art ◽

Gene Prediction ◽

Nucleotide Sequences ◽

Genomic Databases ◽

Large Sets ◽

Multiple Alignments ◽

A Genome ◽

Prokaryotic Genomes ◽

Conservation Patterns

State-of-the-art algorithms of ab initio gene prediction for prokaryotic genomes were shown to be sufficiently accurate. A pair of algorithms would agree on predictions of gene 3′ends. Nonetheless, predictions of gene starts would not match for 15–25% of genes in a genome. This discrepancy is a serious issue that is difficult to be resolved due to the absence of sufficiently large sets of genes with experimentally verified starts. We have introduced StartLink that infers gene starts from conservation patterns revealed by multiple alignments of homologous nucleotide sequences. We also have introduced StartLink+ combining both ab initio and alignment-based methods. The ability of StartLink to predict the start of a given gene is restricted by the availability of homologs in a database. We observed that StartLink made predictions for 85% of genes per genome on average. The StartLink+ accuracy was shown to be 98–99% on the sets of genes with experimentally verified starts. In comparison with database annotations, we observed that the annotated gene starts deviated from the StartLink+ predictions for ∼5% of genes in AT-rich genomes and for 10–15% of genes in GC-rich genomes on average. The use of StartLink+ has a potential to significantly improve gene start annotation in genomic databases.

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00493-w ◽

2021 ◽

Author(s):

Yan-Ting Jin ◽

Cong Ma ◽

Xin Wang ◽

Shu-Xuan Wang ◽

Kai-Yue Zhang ◽

...

Keyword(s):

Protein Function ◽

Codon Position ◽

Systematic Investigation ◽

Functional Category ◽

Gene Clustering ◽

Base Frequency ◽

Significant Difference ◽

Prokaryotic Genomes ◽

Functional Classes ◽

Clustering Pattern

AbstractIn 2002, our research group observed a gene clustering pattern based on the base frequency of A versus T at the second codon position in the genome of Vibrio cholera and found that the functional category distribution of genes in the two clusters was different. With the availability of a large number of sequenced genomes, we performed a systematic investigation of A2–T2 distribution and found that 2694 out of 2764 prokaryotic genomes have an optimal clustering number of two, indicating a consistent pattern. Analysis of the functional categories of the coding genes in each cluster in 1483 prokaryotic genomes indicated, that 99.33% of the genomes exhibited a significant difference (p < 0.01) in function distribution between the two clusters. Specifically, functional category P was overrepresented in the small cluster of 98.65% of genomes, whereas categories J, K, and L were overrepresented in the larger cluster of over 98.52% of genomes. Lineage analysis uncovered that these preferences appear consistently across all phyla. Overall, our work revealed an almost universal clustering pattern based on the relative frequency of A2 versus T2 and its role in functional category preference. These findings will promote the understanding of the rationality of theoretical prediction of functional classes of genes from their nucleotide sequences and how protein function is determined by DNA sequence. Graphical abstract

The origin and impeded dissemination of the DNA phosphorothioation system in prokaryotes

Nature Communications ◽

10.1038/s41467-021-26636-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Huahua Jian ◽

Guanpeng Xu ◽

Yi Yi ◽

Yali Hao ◽

Yinzhao Wang ◽

...

Keyword(s):

Negative Effects ◽

Metabolic Genes ◽

Transcriptomic Sequencing ◽

Prokaryotic Genomes ◽

Dna Backbone ◽

Backbone Modification ◽

Encoding Gene ◽

Pt Modification ◽

Restriction Modification ◽

Competition Assays

AbstractPhosphorothioate (PT) modification by the dnd gene cluster is the first identified DNA backbone modification and constitute an epigenetic system with multiple functions, including antioxidant ability, restriction modification, and virus resistance. Despite these advantages for hosting dnd systems, they are surprisingly distributed sporadically among contemporary prokaryotic genomes. To address this ecological paradox, we systematically investigate the occurrence and phylogeny of dnd systems, and they are suggested to have originated in ancient Cyanobacteria after the Great Oxygenation Event. Interestingly, the occurrence of dnd systems and prophages is significantly negatively correlated. Further, we experimentally confirm that PT modification activates the filamentous phage SW1 by altering the binding affinity of repressor and the transcription level of its encoding gene. Competition assays, concurrent epigenomic and transcriptomic sequencing subsequently show that PT modification affects the expression of a variety of metabolic genes, which reduces the competitive fitness of the marine bacterium Shewanella piezotolerans WP3. Our findings strongly suggest that a series of negative effects on microorganisms caused by dnd systems limit horizontal gene transfer, thus leading to their sporadic distribution. Overall, our study reveals putative evolutionary scenario of the dnd system and provides novel insights into the physiological and ecological influences of PT modification.

Operon formation by insertion sequence IS3 in Escherichia coli

10.1101/2021.11.02.466885 ◽

2021 ◽

Author(s):

Yuki Kanai ◽

Saburo Tsuru ◽

Chikara Furusawa

Keyword(s):

Escherichia Coli ◽

Insertion Sequence ◽

Bacterial Species ◽

Mutation Rates ◽

Insertion Sequences ◽

Proof Of Concept ◽

Regulatory Architecture ◽

Expression Of Genes ◽

Prokaryotic Genomes ◽

Rapid Formation

Operons are a hallmark of the genomic and regulatory architecture of prokaryotes. However, the mechanism by which two genes placed far apart gradually come close and form operons remains to be elucidated. Here, we propose a new model of the origin of operons: Mobile genetic elements called insertion sequences can facilitate the formation of operons by consecutive insertion-deletion-excision reactions. This mechanism barely leaves traces of insertion sequences and is difficult to detect in evolution in nature. We performed, to the best of our knowledge, the first experimental demonstration of operon formation, as a proof of concept. The insertion sequence IS3 and the insertion sequence excision enhancer are genes found in a broad range of bacterial species. We introduced these genes into insertion sequence-less Escherichia coli and found that, supporting our hypothesis, the activity of the two genes altered the expression of genes surrounding IS3, closed a 2.7 kilobase pair gap between a pair of genes, and formed new operons. This study shows how insertion sequences can facilitate the rapid formation of operons through locally increasing the structural mutation rates and highlights how coevolution with mobile elements may shape the organization of prokaryotic genomes and gene regulation.

Geptop 2.0: Accurately Select Essential Genes from the List of Protein-Coding Genes in Prokaryotic Genomes

10.1007/978-1-0716-1720-5_23 ◽

2021 ◽

pp. 423-430

Author(s):

Qing-Feng Wen ◽

Wen Wei ◽

Feng-Biao Guo

Keyword(s):

Essential Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Prokaryotic Genomes

Pseudofinder: detection of pseudogenes in prokaryotic genomes

10.1101/2021.10.07.463580 ◽

2021 ◽

Author(s):

Mitch J Syberg-Olsen ◽

Arkadiy I Garber ◽

Patrick J Keeling ◽

John McCutcheon ◽

Filip Husnik

Keyword(s):

Open Source Software ◽

Evolutionary Dynamics ◽

Population Bottlenecks ◽

Relaxed Selection ◽

Substantial Fraction ◽

Evolutionary Forces ◽

Functional Potential ◽

Prokaryotic Genomes ◽

Ecological Shifts ◽

Inactivating Mutations

Prokaryotic genomes are generally gene dense and encode relatively few pseudogenes, or nonfunctional/inactivated remnants of genes. However, in certain contexts, such as recent ecological shifts or extreme population bottlenecks (such as those experienced by symbionts and pathogens), pseudogenes can quickly accumulate and form a substantial fraction of the genome. Identification of pseudogenes is, thus, a critical step for understanding the evolutionary forces acting upon, and the functional potential encoded within, prokaryotic genomes. Here, we present Pseudofinder, an open-source software dedicated to pseudogene identification and analysis. With Pseudofinder's multi-pronged, reference-based approach, we demonstrate its capacity to detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenes, which can have only one or a few inactivating mutations. Additionally, Pseudofinder can detect intact genes undergoing relaxed selection, which may indicate incipient pseudogene formation. Implementation of Pseudofinder in annotation pipelines will not only clarify the functional potential of sequenced microbes, but will also generate novel insights and hypotheses regarding the evolutionary dynamics of bacterial and archaeal genomes.

Tensor Rules in the Stochastic Organization of Genomes and Genetic Stochastic Resonance in Algebraic Biology

10.20944/preprints202110.0093.v1 ◽

2021 ◽

Author(s):

Sergey V. Petoukhov

Keyword(s):

Quantum Mechanics ◽

Dna Sequences ◽

Genomic Dna ◽

Analytical Tool ◽

Prokaryotic Genomes ◽

Algebraic Biology ◽

Dna Nucleotide Sequence ◽

A Chain ◽

Product Of Matrices

The article is devoted to the new results of the author, which add his previously published ones, of studying hidden rules and symmetries in structures of long single-stranded DNA sequences in eukaryotic and prokaryotic genomes. The author uses the existence of different alphabets of n-plets in DNA: the alphabet of 4 nucleotides, the alphabet of 16 douplets, the alphabet of 64 triplets, etc. Each of such DNA alphabets of n-plets can serve for constructing a text as a chain of these n-plets. Using this possibility, the author represents any long DNA nucleotide sequence as a bunch of many so-called n-texts, each of which is written on the basis of one of these alphabets of n-plets. Each of such n-texts has its individual percents of different n-plets in its genomic DNA. But it turns out that in such multi-alphabetical or multilayer presentation of each of many genomic DNA, analyzed by the author, universal rules of probabilities and symmetry exist in interrelations of its different n-texts regarding their percents of n-plets. In this study, the tensor product of matrices and vectors is used as an effective analytical tool borrowed from the arsenal of quantum mechanics. Some additions to the topic of algebra-holographic principles in genetics are also presented. Taking into account the described genomic rules of probability, the author puts also forward a concept of the important role of stochastic resonances in genetic informatics.

Distinct Expansion of Group II Introns Depends on the Type of Intron-encoded Protein and Genomic Signatures in Prokaryotes

10.1101/2021.09.28.462093 ◽

2021 ◽

Author(s):

Masahiro C. Miura ◽

Shohei Nagata ◽

Satoshi Tamaki ◽

Masaru Tomita ◽

Akio Kanai

Keyword(s):

Arthrospira Platensis ◽

Group Ii Introns ◽

Bioinformatic Pipeline ◽

Genomic Signatures ◽

Representative Species ◽

Prokaryotic Genomes ◽

Group Ii ◽

Self Splicing ◽

The Relationship ◽

Bacterial C

AbstractGroup II introns (G2Is) are self-splicing ribozymes that have retroelement characteristics in prokaryotes. Although G2Is are considered an important factor in the evolution of prokaryotes, comprehensive analyses of these introns among the tens of thousands of prokaryotic genomes currently available are still limited. Here, we developed a bioinformatic pipeline that systematically collects G2Is and applied it to prokaryotic genomes. We found that in bacteria, 25% (447 of 1,790) of the total representative species had an average of 5.3 G2Is, and in archaea, 9% (28 of 296) of the total representative species had an average of 3.0 G2Is. The greatest number of G2Is per species was 101 in Arthrospira platensis (phylum Cyanobacteriota). A comprehensive sequence analysis of the intron-encoded protein (IEP) in each G2I sequence was conducted and resulted in the addition of three new IEP classes (U1–U3) to the previous classification. This analysis suggested that about 30% of all IEPs are noncanonical IEPs. The number of G2Is per species was defined almost at the phylum level, and the type of IEP was associated as a factor in the G2I increase, i.e. there was an explosive increase in G2Is with bacterial C-type IEPs in the phylum Firmicutes and in G2Is with CL-type IEPs in the phylum Cyanobacteriota. We also systematically analyzed the relationship between genomic signatures and the mechanism of these increases in G2Is. This is the first study to systematically characterize G2Is in the prokaryotic phylogenies.

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

Nucleic Acids Research ◽

10.1093/nar/gkab824 ◽

2021 ◽

Author(s):

Kaihao Tang ◽

Weiquan Wang ◽

Yamin Sun ◽

Yiqing Zhou ◽

Pengxia Wang ◽

...

Keyword(s):

Performance Testing ◽

Sequencing Data ◽

Associated Bacteria ◽

Read Alignment ◽

Phage Gene ◽

Short Read Sequencing ◽

Split Read ◽

Prokaryotic Genomes ◽

Mining Tool ◽

Gene Similarity

Abstract The life cycle of temperate phages includes a lysogenic cycle stage when the phage integrates into the host genome and becomes a prophage. However, the identification of prophages that are highly divergent from known phages remains challenging. In this study, by taking advantage of the lysis-lysogeny switch of temperate phages, we designed Prophage Tracer, a tool for recognizing active prophages in prokaryotic genomes using short-read sequencing data, independent of phage gene similarity searching. Prophage Tracer uses the criterion of overlapping split-read alignment to recognize discriminative reads that contain bacterial (attB) and phage (attP) att sites representing prophage excision signals. Performance testing showed that Prophage Tracer could predict known prophages with precise boundaries, as well as novel prophages. Two novel prophages, dsDNA and ssDNA, encoding highly divergent major capsid proteins, were identified in coral-associated bacteria. Prophage Tracer is a reliable data mining tool for the identification of novel temperate phages and mobile genetic elements. The code for the Prophage Tracer is publicly available at https://github.com/WangLab-SCSIO/Prophage_Tracer.

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

Nucleic Acids Research ◽

10.1093/nar/gkab776 ◽

2021 ◽

Author(s):

Donovan H Parks ◽

Maria Chuvochina ◽

Christian Rinke ◽

Aaron J Mussig ◽

Pierre-Alain Chaumeil ◽

...

Keyword(s):

Archaeal Diversity ◽

Policy Changes ◽

Pragmatic Approach ◽

Assembly Quality ◽

Genomic Representation ◽

Prokaryotic Diversity ◽

As Species ◽

Archaeal Species ◽

Prokaryotic Genomes ◽

Taxonomic Changes

Abstract The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy for prokaryotic genomes sourced from the NCBI Assembly database. GTDB R06-RS202 spans 254 090 bacterial and 4316 archaeal genomes, a 270% increase since the introduction of the GTDB in November, 2017. These genomes are organized into 45 555 bacterial and 2339 archaeal species clusters which is a 200% increase since the integration of species clusters into the GTDB in June, 2019. Here, we explore prokaryotic diversity from the perspective of the GTDB and highlight the importance of metagenome-assembled genomes in expanding available genomic representation. We also discuss improvements to the GTDB website which allow tracking of taxonomic changes, easy assessment of genome assembly quality, and identification of genomes assembled from type material or used as species representatives. Methodological updates and policy changes made since the inception of the GTDB are then described along with the procedure used to update species clusters in the GTDB. We conclude with a discussion on the use of average nucleotide identities as a pragmatic approach for delineating prokaryotic species.

prokaryotic genomes
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

The origin and impeded dissemination of the DNA phosphorothioation system in prokaryotes

Operon formation by insertion sequence IS3 in Escherichia coli

Geptop 2.0: Accurately Select Essential Genes from the List of Protein-Coding Genes in Prokaryotic Genomes

Pseudofinder: detection of pseudogenes in prokaryotic genomes

Tensor Rules in the Stochastic Organization of Genomes and Genetic Stochastic Resonance in Algebraic Biology

Distinct Expansion of Group II Introns Depends on the Type of Intron-encoded Protein and Genomic Signatures in Prokaryotes

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

Export Citation Format

prokaryotic genomesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

The origin and impeded dissemination of the DNA phosphorothioation system in prokaryotes

Operon formation by insertion sequence IS3 in Escherichia coli

Geptop 2.0: Accurately Select Essential Genes from the List of Protein-Coding Genes in Prokaryotic Genomes

Pseudofinder: detection of pseudogenes in prokaryotic genomes

Tensor Rules in the Stochastic Organization of Genomes and Genetic Stochastic Resonance in Algebraic Biology

Distinct Expansion of Group II Introns Depends on the Type of Intron-encoded Protein and Genomic Signatures in Prokaryotes

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

prokaryotic genomes
Recently Published Documents