Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile

ABSTRACT Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.

Download Full-text

Hash-based core genome multi-locus sequencing typing for Clostridium difficile

10.1101/686212 ◽

2019 ◽

Author(s):

David W Eyre ◽

Tim EA Peto ◽

Derrick W Crook ◽

A Sarah Walker ◽

Mark H Wilcox

Keyword(s):

Clostridium Difficile ◽

Genome Sequencing ◽

Core Genome ◽

Genetic Relationships ◽

Variant Calling ◽

Whole Genome ◽

Infection Transmission ◽

Recent Transmission ◽

A Genome ◽

Performance Penalty

AbstractBackgroundPathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely-related genomes among a background of thousands of other genomes is challenging.MethodsWe describe a refinement to core-genome multi-locus sequence typing (cgMLST) where alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralised database of sequentially-numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to mapping-based approaches in Clostridium difficile using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals.ResultsHash-cgMLST provided the same results as standard cgMLST with minimal performance penalty. Comparing 272 pairs of replicate sequences, using reference-based mapping there were 0, 1 or 2 SNPs between 262(96%), 5(2%) and 1(<1%) pairs respectively. Using hash-cgMLST or standard cgMLST, 197(72%) replicate pairs had zero gene differences, 37(14%), 8(3%) and 30(11%) pairs had 1, 2 and >2 differences respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies. Considering 413 pairs of infections within ≤2 SNPS, i.e. consistent with recent transmission, 266(64%) had ≤2 gene differences and 50(12%) ≥5 differences. Comparing a genome to 100,000 others took <1 minute using hash-cgMLST.ConclusionHash-cgMLST is an effective surveillance tool that can rapidly identify clusters of related genomes. However, cgMLST/hash-cgMLST generates potentially more false variants than mapping-based analysis. Refined mapping-based variant calling is likely required to precisely define close genetic relationships.

Download Full-text

Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile

Journal of Clinical Microbiology ◽

10.1128/jcm.01987-17 ◽

2018 ◽

Vol 56 (6) ◽

Cited By ~ 22

Author(s):

Stefan Bletz ◽

Sandra Janezic ◽

Dag Harmsen ◽

Maja Rupnik ◽

Alexander Mellmann

Keyword(s):

Clostridium Difficile ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Target Genes ◽

Population Based ◽

Whole Genome Sequencing Data ◽

The Novel ◽

Sequencing Data ◽

Gastrointestinal Infections ◽

Content Type

ABSTRACT Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile . Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange.

Download Full-text

Establishment and evaluation of a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Pseudomonas aeruginosa

Journal of Clinical Microbiology ◽

10.1128/jcm.01987-20 ◽

2020 ◽

pp. JCM.01987-20

Author(s):

Hauke Tönnies ◽

Karola Prior ◽

Dag Harmsen ◽

Alexander Mellmann

Keyword(s):

Pseudomonas Aeruginosa ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Target Genes ◽

Multidrug Resistant ◽

Population Based ◽

Whole Genome Sequence ◽

Nucleotide Polymorphisms ◽

Environmental Bacterium ◽

Random Dataset

The environmental bacterium Pseudomonas aeruginosa, in particular multidrug resistant clones, is often associated with nosocomial infections and outbreaks. Today, core genome multilocus sequence typing (cgMLST) is frequently applied to delineate sporadic cases from nosocomial transmissions. However, until recently, no cgMLST scheme for a standardized typing of P. aeruginosa was available.To establish a novel cgMLST scheme for P. aeruginosa, we initially determined the breadth of the P. aeruginosa population based on MLST data with a Bayesian approach (BAPS). Using genomic data of representative isolates for the whole population and for all 12 serogroups, we extracted target genes and further refined them using a random dataset of 1,000 P. aeruginosa genomes. Subsequently, we investigated reproducibility and discriminatory ability with repeatedly sequenced isolates and isolates from well-defined outbreak scenarios, respectively, and compared clustering applying two recently published cgMLST schemes.BAPS generated seven P. aeruginosa groups. To cover these and all serogroups, 15 reference strains were used to determine genes common in all strains. After refinement with the dataset of 1,000 genomes, the cgMLST scheme consisted of 3,867 target genes, which are representative for the P. aeruginosa population and highly reproducible using biological replicates. We finally evaluated the scheme by reanalyzing two published outbreaks, where the authors used single nucleotide polymorphisms (SNPs) typing. In both cases cgMLST was concordant to the previous SNP results and to the results of the two other cgMLST schemes.In conclusion, the highly-reproducible novel P. aeruginosa cgMLST scheme facilitates outbreak investigations due to the publicly available cgMLST nomenclature.

Download Full-text

A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera

Journal of Bacteriology ◽

10.1128/jb.00086-20 ◽

2020 ◽

Vol 202 (24) ◽

Cited By ~ 2

Author(s):

Kevin Y. H. Liang ◽

Fabini D. Orata ◽

Mohammad Tarequl Islam ◽

Tania Nasreen ◽

Munirul Alam ◽

...

Keyword(s):

Vibrio Cholerae ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Geographic Origin ◽

Data Set ◽

Epidemiological Research ◽

Content Type ◽

Typing Scheme ◽

Subspecies Level ◽

Allelic Differences

ABSTRACT Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/). IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae. Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae. In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.

Download Full-text

Emergence of Mobile Colistin Resistance (mcr-8) in a Highly Successful Klebsiella pneumoniae Sequence Type 15 Clone from Clinical Infections in Bangladesh

mSphere ◽

10.1128/msphere.00023-20 ◽

2020 ◽

Vol 5 (2) ◽

Cited By ~ 7

Author(s):

Refath Farzana ◽

Lim S. Jones ◽

Andrew Barratt ◽

Muhammad Anisur Rahman ◽

Kirsty Sands ◽

...

Keyword(s):

Klebsiella Pneumoniae ◽

South Asia ◽

Core Genome ◽

Treatment Options ◽

Conjugative Plasmid ◽

Poultry Feed ◽

Feed Additive ◽

Nucleotide Polymorphisms ◽

Colistin Resistance ◽

Content Type

ABSTRACT The emergence of mobilized colistin resistance genes (mcr) has become a serious concern in clinical practice, compromising treatment options for life-threatening infections. In this study, colistin-resistant Klebsiella pneumoniae harboring mcr-8.1 was recovered from infected patients in the largest public hospital of Bangladesh, with a prevalence of 0.3% (3/1,097). We found mcr-8.1 in an identical highly stable multidrug-resistant IncFIB(pQil) plasmid of ∼113 kb, which belonged to an epidemiologically successful K. pneumoniae clone, ST15. The resistance mechanism was proven to be horizontally transferable, which incurred a fitness cost to the host. The core genome phylogeny suggested the clonal spread of mcr-8.1 in a Bangladeshi hospital. Core genome single-nucleotide polymorphisms among the mcr-8.1-positive K. pneumoniae isolates ranged from 23 to 110. It has been hypothesized that mcr-8.1 was inserted into IncFIB(pQil) with preexisting resistance loci, blaTEM-1b and blaCTX-M-15, by IS903B. Coincidentally, all resistance determinants in the plasmid [mcr-8.1, ampC, sul2, 1d-APH(6), APH(3′′)-Ib, blaTEM-1b, blaCTX-M-15] were bracketed by IS903B, demonstrating the possibility of intra- and interspecies and intra- and intergenus transposition of entire resistance loci. This is the first report of an mcr-like mechanism from human infections in Bangladesh. However, given the acquisition of mcr-8.1 by a sable conjugative plasmid in a successful high-risk clone of K. pneumoniae ST15, there is a serious risk of dissemination of mcr-8.1 in Bangladesh from 2017 onwards. IMPORTANCE There is a marked paucity in our understanding of the epidemiology of colistin-resistant bacterial pathogens in South Asia. A report by Davies and Walsh (Lancet Infect Dis 18:256–257, https://doi.org/10.1016/S1473-3099(18)30072-0, 2018) suggests the export of colistin from China to India, Vietnam, and South Korea in 2016 was approximately 1,000 tons and mainly used as a poultry feed additive. A few reports forecast that the prevalence of mcr in humans and livestock will increase in South Asia. Given the high prevalence of blaCTX-M-15 and blaNDM in India, Bangladesh, and Pakistan, colistin has become the invariable option for the management of serious infections, leading to the emergence of mcr-like mechanisms in South Asia. Systematic scrutiny of the prevalence and transmission of mcr variants in South Asia is vital to understanding the drivers of mcr genes and to initiate interventions to overcome colistin resistance.

Download Full-text

Comparison of Multilocus Sequence Typing and the Xpert C. difficile/Epi Assay for Identification of Clostridium difficile 027/NAP1/BI

Journal of Clinical Microbiology ◽

10.1128/jcm.03075-15 ◽

2015 ◽

Vol 54 (3) ◽

pp. 775-778 ◽

Cited By ~ 5

Author(s):

Tracy McMillen ◽

Mini Kamboj ◽

N. Esther Babady

Keyword(s):

United States ◽

Clostridium Difficile ◽

Confidence Interval ◽

Multilocus Sequence Typing ◽

The United States ◽

Content Type ◽

Presumptive Identification ◽

Good Agreement

Clostridium difficile027/NAP1/BI is the most commonC. difficilestrain in the United States. The XpertC. difficile/Epi assay allows rapid, presumptive identification ofC. difficileNAP1. We compared XpertC. difficile/Epi to multilocus sequence typing for identification ofC. difficileNAP1 and found “very good” agreement at 97.9% (κ = 0.86; 95% confidence interval, 0.80 to 0.91).

Download Full-text

Genomic Epidemiology of a Protracted Hospital Outbreak Caused by a Toxin A-Negative Clostridium difficile Sublineage PCR Ribotype 017 Strain in London, England

Journal of Clinical Microbiology ◽

10.1128/jcm.00648-15 ◽

2015 ◽

Vol 53 (10) ◽

pp. 3141-3147 ◽

Cited By ~ 28

Author(s):

M. D. Cairns ◽

M. D. Preston ◽

T. D. Lawley ◽

T. G. Clark ◽

R. A. Stabler ◽

...

Keyword(s):

Clostridium Difficile ◽

De Novo ◽

University Hospital ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Conjugative Transposon ◽

Content Type ◽

Genomic Epidemiology ◽

Hospital Outbreak ◽

Nosocomial Diarrhea

Clostridium difficileremains the leading cause of nosocomial diarrhea worldwide, which is largely considered to be due to the production of two potent toxins: TcdA and TcdB. However, PCR ribotype (RT) 017, one of five clonal lineages of human virulentC. difficile, lacks TcdA expression but causes widespread disease. Whole-genome sequencing was applied to 35 isolates from hospitalized patients withC. difficileinfection (CDI) and two environmental ward isolates in London, England. The phylogenetic analysis of single nucleotide polymorphisms (SNPs) revealed a clonal cluster of temporally variable isolates from a single hospital ward at University Hospital Lewisham (UHL) that were distinct from other London hospital isolates.De novoassembled genomes revealed a 49-kbp putative conjugative transposon exclusive to this hospital clonal cluster which would not be revealed by current typing methodologies. This study identified three sublineages ofC. difficileRT017 that are circulating in London. Similar to the notorious RT027 lineage, which has caused global outbreaks of CDI since 2001, the lineage of toxin-defective RT017 strains appears to be continually evolving. By utilization of WGS technologies to identify SNPs and the evolution of clonal strains, the transmission of outbreaks caused by near-identical isolates can be retraced and identified.

Download Full-text

Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analysis in the Epidemiology of Brucella melitensis Infections

Journal of Clinical Microbiology ◽

10.1128/jcm.00517-18 ◽

2018 ◽

Vol 56 (9) ◽

Cited By ~ 15

Author(s):

Anna Janowicz ◽

Fabrizio De Massis ◽

Massimo Ancora ◽

Cesare Cammà ◽

Claudio Patavino ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Brucella Melitensis ◽

Reference Sequence ◽

Snp Analysis ◽

Phylogenetic Distance ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Content Type

ABSTRACT The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ≤6 loci in the cgMLST and ≤7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.

Download Full-text

Evolutionary and Genomic Insights intoClostridioides difficileSequence Type 11: a Diverse Zoonotic and Antimicrobial-Resistant Lineage of Global One Health Importance

mBio ◽

10.1128/mbio.00446-19 ◽

2019 ◽

Vol 10 (2) ◽

Cited By ~ 22

Author(s):

Daniel R. Knight ◽

Brian Kullin ◽

Grace O. Androga ◽

Frederic Barbut ◽

Catherine Eckert ◽

...

Keyword(s):

Clostridium Difficile ◽

Antimicrobial Resistance ◽

Long Range ◽

One Health ◽

Core Genome ◽

Sequence Type ◽

Content Type ◽

Pan Genome ◽

Clostridioides Difficile ◽

Animal Populations

ABSTRACTClostridioides difficile(Clostridium difficile) sequence type 11 (ST11) is well established in production animal populations worldwide and contributes considerably to the global burden ofC. difficileinfection (CDI) in humans. Increasing evidence of shared ancestry and genetic overlap of PCR ribotype 078 (RT078), the most common ST11 sublineage, between human and animal populations suggests that CDI may be a zoonosis. We performed whole-genome sequencing (WGS) on a collection of 207 ST11 and closely related ST258 isolates of human and veterinary/environmental origin, comprising 16 RTs collected from Australia, Asia, Europe, and North America. Core genome single nucleotide variant (SNV) analysis identified multiple intraspecies and interspecies clonal groups (isolates separated by ≤2 core genome SNVs) in all the major RT sublineages: 078, 126, 127, 033, and 288. Clonal groups comprised isolates spread across different states, countries, and continents, indicative of reciprocal long-range dissemination and possible zoonotic/anthroponotic transmission. Antimicrobial resistance genotypes and phenotypes varied across host species, geographic regions, and RTs and included macrolide/lincosamide resistance (Tn6194[ermB]), tetracycline resistance (Tn6190[tetM] and Tn6164[tet44]), and fluoroquinolone resistance (gyrA/Bmutations), as well as numerous aminoglycoside resistance cassettes. The population was defined by a large “open” pan-genome (10,378 genes), a remarkably small core genome of 2,058 genes (only 19.8% of the gene pool), and an accessory genome containing a large and diverse collection of important prophages of theSiphoviridaeandMyoviridae. This study provides novel insights into strain relatedness and genetic variability ofC. difficileST11, a lineage of global One Health importance.IMPORTANCEHistorically,Clostridioides difficile(Clostridium difficile) has been associated with life-threatening diarrhea in hospitalized patients. Increasing rates ofC. difficileinfection (CDI) in the community suggest exposure toC. difficilereservoirs outside the hospital, including animals, the environment, or food.C. difficilesequence type 11 (ST11) is known to infect/colonize livestock worldwide and comprises multiple ribotypes, many of which cause disease in humans, suggesting CDI may be a zoonosis. Using high-resolution genomics, we investigated the evolution and zoonotic potential of ST11 and a new closely related ST258 lineage sourced from diverse origins. We found multiple intra- and interspecies clonal transmission events in all ribotype sublineages. Clones were spread across multiple continents, often without any health care association, indicative of zoonotic/anthroponotic long-range dissemination in the community. ST11 possesses a massive pan-genome and numerous clinically important antimicrobial resistance elements and prophages, which likely contribute to the success of this globally disseminated lineage of One Health importance.

Download Full-text

Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

Applied and Environmental Microbiology ◽

10.1128/aem.02004-14 ◽

2014 ◽

Vol 80 (20) ◽

pp. 6383-6394 ◽

Cited By ~ 83

Author(s):

Gabriele Andrea Lugli ◽

Christian Milani ◽

Francesca Turroni ◽

Sabrina Duranti ◽

Chiara Ferrario ◽

...

Keyword(s):

Sequence Comparison ◽

Core Genome ◽

Genetic Relatedness ◽

Single Gene ◽

Evolutionary Development ◽

Genome Sequences ◽

Separate Species ◽

Content Type ◽

Relative Paucity ◽

A Genome

ABSTRACTTheBifidobacteriumgenus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particularBifidobacteriumtaxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of theBifidobacteriumgenus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation.

Download Full-text