Pan-genome of Novel Pantoea stewartii subsp. indologenes Reveal Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.

Download Full-text

Pan-Genome of Novel Pantoea stewartii subsp. indologenes Reveals Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

Microorganisms ◽

10.3390/microorganisms9081761 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1761

Author(s):

Gaurav Agarwal ◽

Ronald D. Gitaitis ◽

Bhabesh Dutta

Keyword(s):

Gene Transfer ◽

Core Genome ◽

Foxtail Millet ◽

Gene Clusters ◽

Evaluation Study ◽

Full Spectrum ◽

Pan Genome ◽

Pantoea Stewartii ◽

Comparative Phylogenetic Analysis ◽

Core Genes

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.

Download Full-text

Heterogeneity among estimates of the core genome and pan-genome in different pneumococcal populations

10.1101/133991 ◽

2017 ◽

Cited By ~ 5

Author(s):

Andries J van Tonder ◽

James E Bray ◽

Keith A Jolley ◽

Sigríður J Quirk ◽

Gunnsteinn Haraldsson ◽

...

Keyword(s):

Bacterial Population ◽

Core Genome ◽

Bacterial Species ◽

Essential Point ◽

Genetic Lineages ◽

The Core ◽

Pan Genome ◽

Single Dataset ◽

Genomic Regions ◽

Core Genes

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.

Download Full-text

Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

Molecular Genetics and Genomics ◽

10.1007/s00438-015-1154-z ◽

2016 ◽

Vol 291 (2) ◽

pp. 905-912 ◽

Cited By ~ 12

Author(s):

Xiaowen Yang ◽

Yajie Li ◽

Juan Zang ◽

Yexia Li ◽

Pengfei Bie ◽

...

Keyword(s):

Essential Genes ◽

The Core ◽

Pan Genome ◽

Core Genes

Download Full-text

First Steps in the Analysis of Prokaryotic Pan-Genomes

Bioinformatics and Biology Insights ◽

10.1177/1177932220938064 ◽

2020 ◽

Vol 14 ◽

pp. 117793222093806

Author(s):

Sávio Souza Costa ◽

Luís Carlos Guimarães ◽

Artur Silva ◽

Siomar Castro Soares ◽

Rafael Azevedo Baraúna

Keyword(s):

Genome Analysis ◽

Core Genome ◽

Bacterial Species ◽

Genomic Analysis ◽

Gene Families ◽

Specific Group ◽

The Core ◽

Pan Genome ◽

Research Areas ◽

Key Concepts

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.

Download Full-text

A Genomic Survey of Signalling in the Myxococcaceae

Microorganisms ◽

10.3390/microorganisms8111739 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1739

Author(s):

David E. Whitworth ◽

Allison Zwarycz

Keyword(s):

Core Genome ◽

Gene Gain ◽

Fruiting Body Formation ◽

Two Component System ◽

Genome Sequences ◽

The Core ◽

Accessory Genes ◽

Two Component ◽

Gain Loss ◽

Core Genes

As prokaryotes diverge by evolution, essential ‘core’ genes required for conserved phenotypes are preferentially retained, while inessential ‘accessory’ genes are lost or diversify. We used the recently expanded number of myxobacterial genome sequences to investigate the conservation of their signalling proteins, focusing on two sister genera (Myxococcus and Corallococcus), and on a species within each genus (Myxococcus xanthus and Corallococcus exiguus). Four new C. exiguus genome sequences are also described here. Despite accessory genes accounting for substantial proportions of each myxobacterial genome, signalling proteins were found to be enriched in the core genome, with two-component system genes almost exclusively so. We also investigated the conservation of signalling proteins in three myxobacterial behaviours. The linear carotenogenesis pathway was entirely conserved, with no gene gain/loss observed. However, the modular fruiting body formation network was found to be evolutionarily plastic, with dispensable components in all modules (including components required for fruiting in the model myxobacterium M. xanthus DK1622). Quorum signalling (QS) is thought to be absent from most myxobacteria, however, they generally appear to be able to produce CAI-I (cholerae autoinducer-1), to sense other QS molecules, and to disrupt the QS of other organisms, potentially important abilities during predation of other prokaryotes.

Download Full-text

Comparative analysis of probiotic bacteria based on a new definition of core genome

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400127 ◽

2018 ◽

Vol 16 (03) ◽

pp. 1840012 ◽

Cited By ~ 2

Author(s):

Maria Satti ◽

Yasuhiro Tanizawa ◽

Akihito Endo ◽

Masanori Arita

Keyword(s):

Stress Responses ◽

Core Genome ◽

Public Library ◽

Orthologous Gene ◽

Orthologous Group ◽

Probiotic Properties ◽

The Core ◽

Significant Difference ◽

Definition Of ◽

Core Genes

The commensal genus Bifidobacterium has probiotic properties. We prepared a public library of the gene functions of the genus Bifidobacterium for its online annotation. Orthologous gene cluster analysis showed that the pan genomes of Bifidobacterium and Lactobacillus exhibit striking similarities when mapped to the Clusters of Orthologous Group (COG) database of proteins. When the core genes in each genus were selected based on our statistical definition of “core genome”, core genes were present in at least 92% of 52 Bifidobacterium and in 97% of 178 Lactobacillus genomes. Functional comparison of the core genes of the two genera revealed a significant difference in the categories “amino acid transport and metabolism” representing their difference in niche specificity. Over-represented Bifidobacterium protein families were primarily involved in host interactions, the complex compound metabolism, and in stress responses. These findings coincide with the published information and validate our bias-resilient definition of the core genome.

Download Full-text

Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes

F1000Research ◽

10.12688/f1000research.2-63.v1 ◽

2013 ◽

Vol 2 ◽

pp. 63 ◽

Cited By ~ 5

Author(s):

Michel A Marin ◽

Ana Carolina P Vicente

Keyword(s):

Core Genome ◽

Position Effect ◽

Genome Structure ◽

Enrichment Analysis ◽

Gene Clusters ◽

Gene Cassette ◽

Etiologic Agent ◽

Functional Genes ◽

The Core ◽

Core Genes

Background: Vibrio cholerae, the etiologic agent of cholera, is indigenous to aquatic environments. The V. cholerae genome consists of two chromosomes; the smallest of these harbors a large gene capture and excision system called the superintegron (SI), of ~120 kbp. The flexible nature of the SI that results from gene cassette capture, deletion and rearrangement is thought to make it a hotspot of V. cholerae diversity, but beyond the basic structure it is not clear if there is a core genome in the SI and if so how it is structured. The aim of this study was to explore the core genome structure and the differences in gene content among strains of V. cholerae.Methods: From the complete genomes of seven V. cholerae and one Vibrio mimicus representative strains, we recovered the SI sequences based on the locations of the structural gene IntI4 and the V. cholerae repeats. Analysis of the pangenome, including cluster analysis of functional genes, pangenome profile analysis, genetic variation analysis of functional genes, strain evolution analysis and function enrichment analysis of gene clusters, was performed using a pangenome analysis pipeline in addition to the R scripts, splitsTree4 and genoPlotR.Results and conclusions: Here, we reveal the genetic architecture of the V. cholerae SI. It contains eight core genes when V. mimicus is included and 21 core genes when only V. cholerae strains are considered; many of them are present in several copies. The V. cholerae SI has an open pangenome, which means that V. cholerae may be able to import new gene cassettes to SI. The set of dispensable SI genes is influenced by the niche and type species. The core genes are distributed along the SI, apparently without a position effect.

Download Full-text

Virulence and antibiotic resistance plasticity of Arcobacter butzleri: insights on the genomic diversity of an emerging human pathogen

10.1101/775932 ◽

2019 ◽

Author(s):

Joana Isidro ◽

Susana Ferreira ◽

Miguel Pinto ◽

Fernanda Domingues ◽

Mónica Oleastro ◽

...

Keyword(s):

Antibiotic Resistance ◽

Comparative Genomics ◽

Core Genome ◽

Human Pathogen ◽

Genome Diversity ◽

Pathogenic Potential ◽

The Core ◽

Pan Genome ◽

Arcobacter Butzleri ◽

Genome Scale

AbstractArcobacter butzleri is a food and waterborne bacteria and an emerging human pathogen, frequently displaying a multidrug resistant character. Still, no comprehensive genome-scale comparative analysis has been performed so far, which has limited our knowledge on A. butzleri diversification and pathogenicity. Here, we performed a deep genome analysis of A. butzleri focused on decoding its core- and pan-genome diversity and specific genetic traits underlying its pathogenic potential and diverse ecology. In total, 49 A. butzleri strains (collected from human, animal, food and environmental sources) were screened.A. butzleri (genome size 2.07-2.58 Mbp) revealed a large open pan-genome with 7474 genes (about 50% being singletons) and a small core-genome with 1165 genes. The core-genome is highly diverse (≥55% of the core genes presenting at least 40/49 alleles), being enriched with genes associated with housekeeping functions. In contrast, the accessory genome presented a high proportion of loci with an unknown function, also being particularly overrepresented by genes associated with defence mechanisms. A. butzleri revealed a plastic virulome (including newly identified determinants), marked by the differential presence of multiple adaptation-related virulence factors, such as the urease cluster ureD(AB)CEFG (phenotypically confirmed), the hypervariable hemagglutinin-encoding hecA, a putative type I secretion system (T1SS) harboring another agglutinin potentially related to adherence and a novel VirB/D4 T4SS likely linked to interbacterial competition and cytotoxicity. In addition, A. butzleri harbors a large repertoire of efflux pumps (EPs) (ten “core” and nine differentially present) and other antibiotic resistant determinants. We provide the first description of a genetic determinant of macrolides resistance in A. butzleri, by associating the inactivation of a TetR repressor (likely regulating an EP) with erythromycin resistance. Fluoroquinolones resistance correlated with the Thr-85-Ile substitution in GyrA and ampicillin resistance was linked to an OXA-15-like β-lactamase. Remarkably, by decoding the polymorphism pattern of the porin- and adhesin-encoding main antigen PorA, this study strongly supports that this pathogen is able to exchange porA as a whole and/or hypervariable epitope-encoding regions separately, leading to a multitude of chimeric PorA presentations that can impact pathogen-host interaction during infection. Ultimately, our unprecedented screening of short sequence repeats detected potential phase-variable genes related to adaptation and host/environment interaction, such as lipopolysaccharide modification and motility/chemotaxis, suggesting that phase variation likely modulate A. butzleri key adaptive functions.In summary, this study constitutes a turning point on A. butzleri comparative genomics revealing that this human gastrointestinal pathogen is equipped with vast virulence and antibiotic resistance arsenals, which, coupled with its remarkable core- and pan-genome diversity, opens a multitude of phenotypic fingerprints for environmental/host adaptation and pathogenicity.IMPACT STATEMENTDiarrhoeal diseases are the most common cause of human illness caused by foodborne hazards, but the surveillance of diarrhoeal diseases is biased towards the most commonly searched infectious agents (namely Campylobacter jejuni and C. coli). In fact, other less studied pathogens are frequently found as the etiological agent when refined non-selective culture conditions are applied. A hallmark example is the diarrhoeal-causing Arcobacter butzleri which, despite being also associated with extra-intestinal diseases, such as bacteremia in humans and mastitis in animals, and displaying high rates of antibiotic resistance, has not yet been profoundly investigated regarding its epidemiology, diversity and pathogenicity. To overcome the general lack of knowledge on A. butzleri comparative genomics, we provide the first comprehensive genome-scale analysis of A. butzleri focused on exploring the intraspecies virulome content and diversity, resistance determinants, as well as how this pathogen shapes its genome towards ecological adaptation and host invasion. The unveiled scenario of A. butzleri rampant diversity and plasticity reinforces the pathogenic potential of this food and waterborne hazard, while opening multiple research lines that will certainly contribute to the future development of more robust species-oriented diagnostics and molecular surveillance of A. butzleri.DATA SUMMARYA. butzleri raw sequence reads generated in the present study were deposited in the European Nucleotide Archive (ENA) (BioProject PRJEB34441). The assembled contigs (.fasta and .gbk files), the nucleotide sequences of the predicted transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) (.ffn files) and the respective amino acid sequences of the translated CDS sequences (.faa files) are available at http://doi.org/10.5281/zenodo.3434222. Detailed ENA accession numbers, as well as the draft genome statistics are described in Table S1.

Download Full-text

Adding context to the pneumococcal core genes – a bioinformatic analysis of the intergenic pangenome of Streptococcus pneumoniae

10.1101/2021.08.29.458057 ◽

2021 ◽

Author(s):

Flemming Damgaard Nielsen ◽

Jakob Møller-Jensen ◽

Mikkel Girke Jørgensen

Keyword(s):

Streptococcus Pneumoniae ◽

Core Genome ◽

Biological Function ◽

Bioinformatic Analysis ◽

Bacterial Pathogenicity ◽

The Core ◽

Intergenic Regions ◽

Crucial Information ◽

Genetic Context ◽

Core Genes

AbstractWhole genome sequencing offers great opportunities for linking genotypes to phenotypes aiding in our understanding of human disease and bacterial pathogenicity. However, these analyses often overlook non-coding intergenic regions (IGRs). By disregarding the IGRs, crucial information is lost, as genes have little biological function without expression. In this study, we present the first complete pangenome of the important human pathogen Streptococcus pneumoniae (pneumococcus), spanning both the genes and IGRs. We show that the pneumococcus species retains a small core genome of IGRs that are present across all isolates. Gene expression is highly dependent on these core IGRs, and often several copies of these core IGRs are found across each genome. Core genes and core IGRs show a clear linkage as 81% of core genes are associated with core IGRs. Additionally, we identify a single IGR within the core genome that is always occupied by one of two highly distinct sequences, scattered across the phylogenetic tree. Their distribution indicates that this IGR is transferred between isolates through horizontal regulatory transfer independent of the flanking genes and that each type likely serves different regulatory roles depending on their genetic context.

Download Full-text

Pan-Genome-Wide Analysis of Pantoea ananatis Identified Genes Linked to Pathogenicity in Onion

Frontiers in Microbiology ◽

10.3389/fmicb.2021.684756 ◽

2021 ◽

Vol 12 ◽

Author(s):

Gaurav Agarwal ◽

Divya Choudhary ◽

Shaun P. Stice ◽

Brendon K. Myers ◽

Ronald D. Gitaitis ◽

...

Keyword(s):

Gene Transfer ◽

Gene Cluster ◽

Core Genome ◽

Bacterial Colonization ◽

Secretion Systems ◽

Pantoea Ananatis ◽

Novel Genes ◽

Pan Genome ◽

Niche Adaptation ◽

Genome Wide

Pantoea ananatis, a gram negative and facultative anaerobic bacterium is a member of a Pantoea spp. complex that causes center rot of onion, which significantly affects onion yield and quality. This pathogen does not have typical virulence factors like type II or type III secretion systems but appears to require a biosynthetic gene-cluster, HiVir/PASVIL (located chromosomally comprised of 14 genes), for a phosphonate secondary metabolite, and the ‘alt’ gene cluster (located in plasmid and comprised of 11 genes) that aids in bacterial colonization in onion bulbs by imparting tolerance to thiosulfinates. We conducted a deep pan-genome-wide association study (pan-GWAS) to predict additional genes associated with pathogenicity in P. ananatis using a panel of diverse strains (n = 81). We utilized a red-onion scale necrosis assay as an indicator of pathogenicity. Based on this assay, we differentiated pathogenic (n = 51)- vs. non-pathogenic (n = 30)-strains phenotypically. Pan-genome analysis revealed a large core genome of 3,153 genes and a flexible accessory genome. Pan-GWAS using the presence and absence variants (PAVs) predicted 42 genes, including 14 from the previously identified HiVir/PASVIL cluster associated with pathogenicity, and 28 novel genes that were not previously associated with pathogenicity in onion. Of the 28 novel genes identified, eight have annotated functions of site-specific tyrosine kinase, N-acetylmuramoyl-L-alanine amidase, conjugal transfer, and HTH-type transcriptional regulator. The remaining 20 genes are currently hypothetical. Further, a core-genome SNPs-based phylogeny and horizontal gene transfer (HGT) studies were also conducted to assess the extent of lateral gene transfer among diverse P. ananatis strains. Phylogenetic analysis based on PAVs and whole genome multi locus sequence typing (wgMLST) rather than core-genome SNPs distinguished red-scale necrosis inducing (pathogenic) strains from non-scale necrosis inducing (non-pathogenic) strains of P. ananatis. A total of 1182 HGT events including the HiVir/PASVIL and alt cluster genes were identified. These events could be regarded as a major contributing factor to the diversification, niche-adaptation and potential acquisition of pathogenicity/virulence genes in P. ananatis.

Download Full-text