Pangenomics of the Symbiotic Rhizobiales. Core and Accessory Functions Across a Group Endowed with High Levels of Genomic Plasticity

Pangenome analyses reveal major clues on evolutionary instances and critical genome core conservation. The order Rhizobiales encompasses several families with rather disparate ecological attitudes. Among them, Rhizobiaceae, Bradyrhizobiaceae, Phyllobacteriacreae and Xanthobacteriaceae, include members proficient in mutualistic symbioses with plants based on the bacterial conversion of N2 into ammonia (nitrogen-fixation). The pangenome of 12 nitrogen-fixing plant symbionts of the Rhizobiales was analyzed yielding total 37,364 loci, with a core genome constituting 700 genes. The percentage of core genes averaged 10.2% over single genomes, and between 5% to 7% were found to be plasmid-associated. The comparison between a representative reference genome and the core genome subset, showed the core genome highly enriched in genes for macromolecule metabolism, ribosomal constituents and overall translation machinery, while membrane/periplasm-associated genes, and transport domains resulted under-represented. The analysis of protein functions revealed that between 1.7% and 4.9% of core proteins could putatively have different functions.

Download Full-text

A Genomic Survey of Signalling in the Myxococcaceae

Microorganisms ◽

10.3390/microorganisms8111739 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1739

Author(s):

David E. Whitworth ◽

Allison Zwarycz

Keyword(s):

Core Genome ◽

Gene Gain ◽

Fruiting Body Formation ◽

Two Component System ◽

Genome Sequences ◽

The Core ◽

Accessory Genes ◽

Two Component ◽

Gain Loss ◽

Core Genes

As prokaryotes diverge by evolution, essential ‘core’ genes required for conserved phenotypes are preferentially retained, while inessential ‘accessory’ genes are lost or diversify. We used the recently expanded number of myxobacterial genome sequences to investigate the conservation of their signalling proteins, focusing on two sister genera (Myxococcus and Corallococcus), and on a species within each genus (Myxococcus xanthus and Corallococcus exiguus). Four new C. exiguus genome sequences are also described here. Despite accessory genes accounting for substantial proportions of each myxobacterial genome, signalling proteins were found to be enriched in the core genome, with two-component system genes almost exclusively so. We also investigated the conservation of signalling proteins in three myxobacterial behaviours. The linear carotenogenesis pathway was entirely conserved, with no gene gain/loss observed. However, the modular fruiting body formation network was found to be evolutionarily plastic, with dispensable components in all modules (including components required for fruiting in the model myxobacterium M. xanthus DK1622). Quorum signalling (QS) is thought to be absent from most myxobacteria, however, they generally appear to be able to produce CAI-I (cholerae autoinducer-1), to sense other QS molecules, and to disrupt the QS of other organisms, potentially important abilities during predation of other prokaryotes.

Download Full-text

Comparative analysis of probiotic bacteria based on a new definition of core genome

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400127 ◽

2018 ◽

Vol 16 (03) ◽

pp. 1840012 ◽

Cited By ~ 2

Author(s):

Maria Satti ◽

Yasuhiro Tanizawa ◽

Akihito Endo ◽

Masanori Arita

Keyword(s):

Stress Responses ◽

Core Genome ◽

Public Library ◽

Orthologous Gene ◽

Orthologous Group ◽

Probiotic Properties ◽

The Core ◽

Significant Difference ◽

Definition Of ◽

Core Genes

The commensal genus Bifidobacterium has probiotic properties. We prepared a public library of the gene functions of the genus Bifidobacterium for its online annotation. Orthologous gene cluster analysis showed that the pan genomes of Bifidobacterium and Lactobacillus exhibit striking similarities when mapped to the Clusters of Orthologous Group (COG) database of proteins. When the core genes in each genus were selected based on our statistical definition of “core genome”, core genes were present in at least 92% of 52 Bifidobacterium and in 97% of 178 Lactobacillus genomes. Functional comparison of the core genes of the two genera revealed a significant difference in the categories “amino acid transport and metabolism” representing their difference in niche specificity. Over-represented Bifidobacterium protein families were primarily involved in host interactions, the complex compound metabolism, and in stress responses. These findings coincide with the published information and validate our bias-resilient definition of the core genome.

Download Full-text

Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes

F1000Research ◽

10.12688/f1000research.2-63.v1 ◽

2013 ◽

Vol 2 ◽

pp. 63 ◽

Cited By ~ 5

Author(s):

Michel A Marin ◽

Ana Carolina P Vicente

Keyword(s):

Core Genome ◽

Position Effect ◽

Genome Structure ◽

Enrichment Analysis ◽

Gene Clusters ◽

Gene Cassette ◽

Etiologic Agent ◽

Functional Genes ◽

The Core ◽

Core Genes

Background: Vibrio cholerae, the etiologic agent of cholera, is indigenous to aquatic environments. The V. cholerae genome consists of two chromosomes; the smallest of these harbors a large gene capture and excision system called the superintegron (SI), of ~120 kbp. The flexible nature of the SI that results from gene cassette capture, deletion and rearrangement is thought to make it a hotspot of V. cholerae diversity, but beyond the basic structure it is not clear if there is a core genome in the SI and if so how it is structured. The aim of this study was to explore the core genome structure and the differences in gene content among strains of V. cholerae.Methods: From the complete genomes of seven V. cholerae and one Vibrio mimicus representative strains, we recovered the SI sequences based on the locations of the structural gene IntI4 and the V. cholerae repeats. Analysis of the pangenome, including cluster analysis of functional genes, pangenome profile analysis, genetic variation analysis of functional genes, strain evolution analysis and function enrichment analysis of gene clusters, was performed using a pangenome analysis pipeline in addition to the R scripts, splitsTree4 and genoPlotR.Results and conclusions: Here, we reveal the genetic architecture of the V. cholerae SI. It contains eight core genes when V. mimicus is included and 21 core genes when only V. cholerae strains are considered; many of them are present in several copies. The V. cholerae SI has an open pangenome, which means that V. cholerae may be able to import new gene cassettes to SI. The set of dispensable SI genes is influenced by the niche and type species. The core genes are distributed along the SI, apparently without a position effect.

Download Full-text

Heterogeneity among estimates of the core genome and pan-genome in different pneumococcal populations

10.1101/133991 ◽

2017 ◽

Cited By ~ 5

Author(s):

Andries J van Tonder ◽

James E Bray ◽

Keith A Jolley ◽

Sigríður J Quirk ◽

Gunnsteinn Haraldsson ◽

...

Keyword(s):

Bacterial Population ◽

Core Genome ◽

Bacterial Species ◽

Essential Point ◽

Genetic Lineages ◽

The Core ◽

Pan Genome ◽

Single Dataset ◽

Genomic Regions ◽

Core Genes

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.

Download Full-text

Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments.

10.7287/peerj.preprints.26665v2 ◽

2018 ◽

Author(s):

Hugo R Barajas de la Torre ◽

Miguel Romero ◽

Shamayim Martínez-Sánchez ◽

Luis D Alcaraz

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Core Genome ◽

Comparative Genomic ◽

Rrna Gene ◽

Gene Phylogeny ◽

The Core ◽

Sequence Identity ◽

Core Proteins ◽

Genomic Similarity

Background. Comparative genomics between closely related bacterial strains can distinguish important features determining pathogenesis, antibiotic resistance, and phylogenetic structure. The Streptococcus genus is relevant to public health and food safety and it is well-represented (>100 genomes) in databases of publicly available databases. Streptococci are cosmopolitan, with multiple sources of isolation, from humans to dairy products. The Streptococcus genus has been classified by morphology, serotypes, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between species of Streptococcus. The Streptococcus core genome can be used to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny was calculated for 108 strains, belonging to 16 Streptococcus species and compared to a dendrogram using GSS pairwise distances for the same genomes. The core and pan-genome were calculated for these 108 genomes. The core genome sequences were analyzed and used as a resource to discriminate homologous fragment reads from closely related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which is the core genome. The pairwise amino acid identity values of the core proteins for all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) is predominantly found in core clusters containing ribosomal and translation-related proteins. For 48 core proteins (11.8%) no functional assignment could be made and those proteins have larger sequence identity variations than other core proteins. The sequence identity of the core genome diminishes as GSS score between species decreases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). Finally, the core genome was used to distinguish between closely related species within human oral metagenomes. Discussion. The Streptococcus genus provides a benchmark dataset for comparative genomic studies due to the breath depth of genomic coverage. Comparing metagenomic shotgun fragment reads to the core genome using rapid alignment tools allows species-specific abundance estimates in metagenomic samples. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation.

Download Full-text

Pan-genome of Novel Pantoea stewartii subsp. indologenes Reveal Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

10.20944/preprints202107.0400.v1 ◽

2021 ◽

Author(s):

Gaurav Agarwal ◽

Ronald D. Gitaitis ◽

Bhabesh Dutta

Keyword(s):

Gene Transfer ◽

Core Genome ◽

Foxtail Millet ◽

Evaluation Study ◽

Full Spectrum ◽

The Core ◽

Pan Genome ◽

Pantoea Stewartii ◽

Comparative Phylogenetic Analysis ◽

Core Genes

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.

Download Full-text

Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments

PeerJ ◽

10.7717/peerj.6233 ◽

2019 ◽

Vol 6 ◽

pp. e6233 ◽

Cited By ~ 4

Author(s):

Hugo R. Barajas ◽

Miguel F. Romero ◽

Shamayim Martínez-Sánchez ◽

Luis D. Alcaraz

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Core Genome ◽

Bacterial Species ◽

Genomic Diversity ◽

Comparative Genomic ◽

Rrna Gene ◽

Gene Phylogeny ◽

The Core ◽

Core Proteins

Background The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. Methods Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. Results The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. Discussion Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.

Download Full-text

How rhizobia adapt to the nodule environment

Journal of Bacteriology ◽

10.1128/jb.00539-20 ◽

2021 ◽

Author(s):

Raphael Ledermann ◽

Carolin C. M. Schulte ◽

Philip S. Poole

Keyword(s):

Nitrogen Fixation ◽

Soil Bacteria ◽

Root Nodule ◽

Computational Modelling ◽

Diverse Group ◽

Nitrogen Fixing ◽

The Core ◽

Physiological Processes ◽

Mutualistic Relationship ◽

The Common

Rhizobia are a phylogenetically diverse group of soil bacteria that engage in mutualistic interactions with legume plants. Although specifics of the symbioses differ between strains and plants, all symbioses ultimately result in the formation of specialized root nodule organs which host the nitrogen-fixing microsymbionts called bacteroids. Inside nodules, bacteroids encounter unique conditions that necessitate global reprogramming of physiological processes and rerouting of their metabolism. Decades of research have addressed these questions using genetics, omics approaches, and more recently computational modelling. Here we discuss the common adaptations of rhizobia to the nodule environment that define the core principles of bacteroid functioning. All bacteroids are growth-arrested and perform energy-intensive nitrogen fixation fueled by plant-provided C4-dicarboxylates at nanomolar oxygen levels. At the same time, bacteroids are subject to host control and sanctioning that ultimately determine their fitness and have fundamental importance for the evolution of a stable mutualistic relationship.

Download Full-text

Adding context to the pneumococcal core genes – a bioinformatic analysis of the intergenic pangenome of Streptococcus pneumoniae

10.1101/2021.08.29.458057 ◽

2021 ◽

Author(s):

Flemming Damgaard Nielsen ◽

Jakob Møller-Jensen ◽

Mikkel Girke Jørgensen

Keyword(s):

Streptococcus Pneumoniae ◽

Core Genome ◽

Biological Function ◽

Bioinformatic Analysis ◽

Bacterial Pathogenicity ◽

The Core ◽

Intergenic Regions ◽

Crucial Information ◽

Genetic Context ◽

Core Genes

AbstractWhole genome sequencing offers great opportunities for linking genotypes to phenotypes aiding in our understanding of human disease and bacterial pathogenicity. However, these analyses often overlook non-coding intergenic regions (IGRs). By disregarding the IGRs, crucial information is lost, as genes have little biological function without expression. In this study, we present the first complete pangenome of the important human pathogen Streptococcus pneumoniae (pneumococcus), spanning both the genes and IGRs. We show that the pneumococcus species retains a small core genome of IGRs that are present across all isolates. Gene expression is highly dependent on these core IGRs, and often several copies of these core IGRs are found across each genome. Core genes and core IGRs show a clear linkage as 81% of core genes are associated with core IGRs. Additionally, we identify a single IGR within the core genome that is always occupied by one of two highly distinct sequences, scattered across the phylogenetic tree. Their distribution indicates that this IGR is transferred between isolates through horizontal regulatory transfer independent of the flanking genes and that each type likely serves different regulatory roles depending on their genetic context.

Download Full-text

Identification of Multiple Replication Stages and Origins in the Nucleopolyhedrovirus of Anticarsia gemmatalis

Viruses ◽

10.3390/v11070648 ◽

2019 ◽

Vol 11 (7) ◽

pp. 648

Author(s):

Solange A.B. Miele ◽

Carolina S. Cerrudo ◽

Cintia N. Parsza ◽

María Victoria Nugnes ◽

Diego L. Mengual Gómez ◽

...

Keyword(s):

Core Genome ◽

Anticarsia Gemmatalis ◽

Replication Machinery ◽

Host Proteins ◽

Viral Dna ◽

The Core ◽

Plasmid Library ◽

Close Proximity ◽

Multiple Replication ◽

Core Genes

To understand the mechanism of replication used by baculoviruses, it is essential to describe all the factors involved, including virus and host proteins and the sequences where DNA synthesis starts. A lot of work on this topic has been done, but there is still confusion in defining what sequence/s act in such functions, and the mechanism of replication is not very well understood. In this work, we performed an AgMNPV replication kinetics into the susceptible UFL-Ag-286 cells to estimate viral genome synthesis rates. We found that the viral DNA exponentially increases in two different phases that are temporally separated by an interval of 5 h, probably suggesting the occurrence of two different mechanisms of replication. Then, we prepared a plasmid library containing virus fragments (0.5–2 kbp), which were transfected and infected with AgMNPV in UFL-Ag-286 cells. We identified 12 virus fragments which acted as origins of replication (ORI). Those fragments are in close proximity to core genes. This association to the core genome would ensure vertical transmission of ORIs. We also predict the presence of common structures on those fragments that probably recruit the replication machinery, a structure also present in previously reported ORIs in baculoviruses.

Download Full-text