scholarly journals A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera

2008 ◽  
Vol 191 (1) ◽  
pp. 91-99 ◽  
Author(s):  
Marc Deloger ◽  
Meriem El Karoui ◽  
Marie-Agnès Petit

ABSTRACT The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.

2020 ◽  
Vol 14 ◽  
pp. 117793222093806
Author(s):  
Sávio Souza Costa ◽  
Luís Carlos Guimarães ◽  
Artur Silva ◽  
Siomar Castro Soares ◽  
Rafael Azevedo Baraúna

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.


2018 ◽  
Vol 16 (03) ◽  
pp. 1840012 ◽  
Author(s):  
Maria Satti ◽  
Yasuhiro Tanizawa ◽  
Akihito Endo ◽  
Masanori Arita

The commensal genus Bifidobacterium has probiotic properties. We prepared a public library of the gene functions of the genus Bifidobacterium for its online annotation. Orthologous gene cluster analysis showed that the pan genomes of Bifidobacterium and Lactobacillus exhibit striking similarities when mapped to the Clusters of Orthologous Group (COG) database of proteins. When the core genes in each genus were selected based on our statistical definition of “core genome”, core genes were present in at least 92% of 52 Bifidobacterium and in 97% of 178 Lactobacillus genomes. Functional comparison of the core genes of the two genera revealed a significant difference in the categories “amino acid transport and metabolism” representing their difference in niche specificity. Over-represented Bifidobacterium protein families were primarily involved in host interactions, the complex compound metabolism, and in stress responses. These findings coincide with the published information and validate our bias-resilient definition of the core genome.


2017 ◽  
Author(s):  
Andries J van Tonder ◽  
James E Bray ◽  
Keith A Jolley ◽  
Sigríður J Quirk ◽  
Gunnsteinn Haraldsson ◽  
...  

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6233 ◽  
Author(s):  
Hugo R. Barajas ◽  
Miguel F. Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D. Alcaraz

Background The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. Methods Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. Results The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. Discussion Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.


2022 ◽  
Author(s):  
Mark Achtman ◽  
Zhemin Zhou ◽  
Jane Charlesworth ◽  
Laura A. Baxter

The definition of bacterial species is traditionally a taxonomic issue while defining bacterial populations is done with population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST alleles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate the ability of HierCC to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and revealed differences in vertical inheritance of O serogroups. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.


2018 ◽  
Vol 151 (3) ◽  
pp. 327-351 ◽  
Author(s):  
Dries Van den Broeck ◽  
Andreas Frisch ◽  
Tahina Razafindrahaja ◽  
Bart Van de Vijver ◽  
Damien Ertz

Background and aims – The Arthoniaceae form a species-rich family of lichenized, lichenicolous and saprophytic fungi in the order Arthoniales. As part of taxonomic revisions of the African Arthoniaceae, a number of species assignable to the genus Synarthonia were collected and sequenced. The present study aims at placing the genus in a phylogeny for the first time and at clarifying its circumscription. Methods – Nuclear (RPB2) and mitochondrial (mtSSU) DNA sequences from freshly collected specimens were obtained and analysed with phylogenetic Bayesian and maximum likelihood (ML) methods. Key results – Synarthonia is closely related to the genera Reichlingia and Coniocarpon in the Arthoniaceae. Six Synarthonia species are described as new to science and ten new combinations into this genus are made. A worldwide identification key to the genus Synarthonia is provided. Lectotypes are chosen for Arthonia elegans, A. inconspicua, A. lopingensis, A. ochracea, A. subcaesia and A. translucens. Arthonia thamnocarpa is synonymized with Sclerophyton elegans, and Arthonia elegans with Coniocarpon fallax. Synarthonia ochracea is shown to be a misunderstood species in the past and recent literature, since it was erroneously synonymized with Coniocarpon elegans. Synarthonia ochracea appears to start its life cycle as a non-lichenized lichenicolous fungus on Graphis before developing a lichenized thallus or it might be a facultatively lichenicolous fungus. It belongs to a complex of closely related species whose biology and circumscription are still in need of further studies.Conclusions – Synarthonia forms a monophyletic but somewhat heterogeneous lineage closely related to Coniocarpon and Reichlingia. As delimited here, Synarthonia includes corticolous lichens with a trentepohlioid photobiont as well as non-lichenized lichenicolous fungi. The core group is characterized by white pruinose ascomata, but species producing orange pruinose or non-pruinose ascomata are also included. Ascospores are transversely septate with an enlarged apical cell or are muriform. Future molecular and morphological studies are needed for a better circumscription and definition of the genus.


2019 ◽  
Author(s):  
Thomas Sakoparnig ◽  
Chris Field ◽  
Erik van Nimwegen

AbstractAlthough homologous recombination is accepted to be common in bacteria, so far it has been challenging to accurately quantify its impact on genome evolution within bacterial species. We here introduce methods that use the statistics of single-nucleotide polymorphism (SNP) splits in the core genome alignment of a set of strains to show that, for many bacterial species, recombination dominates genome evolution. Each genomic locus has been overwritten so many times by recombination that it is impossible to reconstruct the clonal phylogeny and, instead of a consensus phylogeny, the phylogeny typically changes many thousands of times along the core genome alignment.We also show how SNP splits can be used to quantify the relative rates with which different subsets of strains have recombined in the past. We find that virtually every strain has a unique pattern of frequencies with which its lineages have recombined with those of other strains, and that the relative rates with which different subsets of strains share SNPs follow long-tailed distributions. Our findings show that bacterial populations are neither clonal nor freely recombining, but structured such that recombination rates between different lineages vary along a continuum spanning several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect these long-tailed distributions of recombination rates.


2018 ◽  
Author(s):  
Mark A O’Dea ◽  
Tanya Laird ◽  
Rebecca Abraham ◽  
David Jordan ◽  
Kittitat Lugsomya ◽  
...  

AbstractStreptococcus suisis a major zoonotic pathogen that causes severe disease in both humans and pigs. In this study, we investigatedS. suisfrom 148 cases of clinical disease in pigs from 46 pig herds over a period of seven years. These isolates underwent whole genome sequencing, genome analysis and antimicrobial susceptibility testing. Genome sequence data of Australian isolates was compared at the core genome level to clinical isolates from overseas. Results demonstrated eight predominant multi-locus sequence types and two majorcpsgene types (cps2 and 3). At the core genome level Australian isolates clustered predominantly within one large clade consisting of isolates from the UK, Canada and North America. In particular, serotype 2 MLST25 strains were very closely associated with Canadian and North American strains. A very small proportion of Australian swine isolates (5%) were phylogenetically associated with south-east Asian and UK isolates, many of which were classified as causing systemic disease, and derived from cases of human and swine disease. In addition, we show that ST1 clones carry a constellation of putative virulence genes not present in other Australian STs, and that this is mirrored in overseas ST1 clones. Based on this dataset we provide a comprehensive outline of the currentS. suisclones associated with disease in Australian pigs and their global context, and discuss the implications this has on antimicrobial therapy, potential vaccine candidates and public health.ImportanceIn this study, we examine in detail, the genomic characteristics of 148Streptococcus suisisolates from clinically diseased Australian pigs. We report the antimicrobial susceptibility profiles, virulence gene analysis and relationship to isolates from other regions of the world. We also demonstrate that ST1 clones, regardless of serotype, carry a large array of putative virulence genes while maintaining a small total gene content. This compilation of data has major ramifications for vaccine development, and refines the understanding of the distribution of various strains of this potentially-fatal zoonotic agent in the global pig industry


2016 ◽  
Vol 3 (3) ◽  
Author(s):  
James R. Johnson ◽  
Gregg Davis ◽  
Connie Clabots ◽  
Brian D. Johnston ◽  
Stephen Porter ◽  
...  

Abstract Background.  Within-household sharing of strains from the resistance-associated H30R1 and H30Rx subclones of Escherichia coli sequence type 131 (ST131) has been inferred based on conventional typing data, but it has been assessed minimally using whole genome sequence (WGS) analysis. Methods.  Thirty-three clinical and fecal isolates of ST131-H30R1 and ST131-H30Rx, from 20 humans and pets in 6 households, underwent WGS analysis for comparison with 52 published ST131 genomes. Phylogenetic relationships were inferred using a bootstrapped maximum likelihood tree based on core genome sequence polymorphisms. Accessory traits were compared between phylogenetically similar isolates. Results.  In the WGS-based phylogeny, isolates clustered strictly by household, in clades that were distributed widely across the phylogeny, interspersed between H30R1 and H30Rx comparison genomes. For only 1 household did the core genome phylogeny place epidemiologically unlinked isolates together with household isolates, but even there multiple differences in accessory genome content clearly differentiated these 2 groups. The core genome phylogeny supported within-household strain sharing, fecal-urethral urinary tract infection pathogenesis (with the entire household potentially providing the fecal reservoir), and instances of host-specific microevolution. In 1 instance, the household's index strain persisted for 6 years before causing a new infection in a different household member. Conclusions.  Within-household sharing of E coli ST131 strains was confirmed extensively at the genome level, as was long-term colonization and repeated infections due to an ST131-H30Rx strain. Future efforts toward surveillance and decolonization may need to address not just the affected patient but also other human and animal household members.


Sign in / Sign up

Export Citation Format

Share Document