scholarly journals Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile

2018 ◽  
Vol 56 (6) ◽  
Author(s):  
Stefan Bletz ◽  
Sandra Janezic ◽  
Dag Harmsen ◽  
Maja Rupnik ◽  
Alexander Mellmann

ABSTRACT Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile . Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange.

2020 ◽  
pp. JCM.01987-20
Author(s):  
Hauke Tönnies ◽  
Karola Prior ◽  
Dag Harmsen ◽  
Alexander Mellmann

The environmental bacterium Pseudomonas aeruginosa, in particular multidrug resistant clones, is often associated with nosocomial infections and outbreaks. Today, core genome multilocus sequence typing (cgMLST) is frequently applied to delineate sporadic cases from nosocomial transmissions. However, until recently, no cgMLST scheme for a standardized typing of P. aeruginosa was available.To establish a novel cgMLST scheme for P. aeruginosa, we initially determined the breadth of the P. aeruginosa population based on MLST data with a Bayesian approach (BAPS). Using genomic data of representative isolates for the whole population and for all 12 serogroups, we extracted target genes and further refined them using a random dataset of 1,000 P. aeruginosa genomes. Subsequently, we investigated reproducibility and discriminatory ability with repeatedly sequenced isolates and isolates from well-defined outbreak scenarios, respectively, and compared clustering applying two recently published cgMLST schemes.BAPS generated seven P. aeruginosa groups. To cover these and all serogroups, 15 reference strains were used to determine genes common in all strains. After refinement with the dataset of 1,000 genomes, the cgMLST scheme consisted of 3,867 target genes, which are representative for the P. aeruginosa population and highly reproducible using biological replicates. We finally evaluated the scheme by reanalyzing two published outbreaks, where the authors used single nucleotide polymorphisms (SNPs) typing. In both cases cgMLST was concordant to the previous SNP results and to the results of the two other cgMLST schemes.In conclusion, the highly-reproducible novel P. aeruginosa cgMLST scheme facilitates outbreak investigations due to the publicly available cgMLST nomenclature.


2019 ◽  
Vol 58 (1) ◽  
Author(s):  
David W. Eyre ◽  
Tim E. A. Peto ◽  
Derrick W. Crook ◽  
A. Sarah Walker ◽  
Mark H. Wilcox

ABSTRACT Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.


2019 ◽  
Vol 57 (6) ◽  
Author(s):  
R. C. Jones ◽  
L. G. Harris ◽  
S. Morgan ◽  
M. C. Ruddy ◽  
M. Perry ◽  
...  

ABSTRACT An inability to standardize the bioinformatic data produced by whole-genome sequencing (WGS) has been a barrier to its widespread use in tuberculosis phylogenetics. The aim of this study was to carry out a phylogenetic analysis of tuberculosis in Wales, United Kingdom, using Ridom SeqSphere software for core genome multilocus sequence typing (cgMLST) analysis of whole-genome sequencing data. The phylogenetics of tuberculosis in Wales have not previously been studied. Sixty-six Mycobacterium tuberculosis isolates (including 42 outbreak-associated isolates) from south Wales were sequenced using an Illumina platform. Isolates were assigned to principal genetic groups, single nucleotide polymorphism (SNP) cluster groups, lineages, and sublineages using SNP-calling protocols. WGS data were submitted to the Ridom SeqSphere software for cgMLST analysis and analyzed alongside 179 previously lineage-defined isolates. The data set was dominated by the Euro-American lineage, with the sublineage composition being dominated by T, X, and Haarlem family strains. The cgMLST analysis successfully assigned 58 isolates to major lineages, and the results were consistent with those obtained by traditional SNP mapping methods. In addition, the cgMLST scheme was used to resolve an outbreak of tuberculosis occurring in the region. This study supports the use of a cgMLST method for standardized phylogenetic assignment of tuberculosis isolates and for outbreak resolution and provides the first insight into Welsh tuberculosis phylogenetics, identifying the presence of the Haarlem sublineage commonly associated with virulent traits.


2015 ◽  
Vol 53 (12) ◽  
pp. 3788-3797 ◽  
Author(s):  
Mark de Been ◽  
Mette Pinholt ◽  
Janetta Top ◽  
Stefan Bletz ◽  
Alexander Mellmann ◽  
...  

Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology ofE. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme forE. faecium. cgMLST transfers genome-wide single nucleotide polymorphism (SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. TheE. faeciumcgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of theE. faeciumcgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing ofE. faeciumoutbreaks.


2019 ◽  
Vol 57 (3) ◽  
Author(s):  
Bernd Neumann ◽  
Karola Prior ◽  
Jennifer K. Bender ◽  
Dag Harmsen ◽  
Ingo Klare ◽  
...  

ABSTRACTAmong enterococci,Enterococcus faecalisoccurs ubiquitously, with the highest incidence of human and animal infections. The high genetic plasticity ofE. faecaliscomplicates both molecular investigations and phylogenetic analyses. Whole-genome sequencing (WGS) enables unraveling of epidemiological linkages and putative transmission events between humans, animals, and food. Core genome multilocus sequence typing (cgMLST) aims to combine the discriminatory power of classical multilocus sequence typing (MLST) with the extensive genetic data obtained by WGS. By sequencing a representative collection of 146E. faecalisstrains isolated from hospital outbreaks, food, animals, and colonization of healthy human individuals, we established a novel cgMLST scheme with 1,972 gene targets within the Ridom SeqSphere+software. To test theE. faecaliscgMLST scheme and assess the typing performance, different collections comprising environmental and bacteremia isolates, as well as all publicly available genome sequences from the NCBI and SRA databases, were analyzed. In more than 98.6% of the tested genomes, >95% good cgMLST target genes were detected (mean, 99.2% target genes). Our genotyping results not only corroborate the known epidemiological background of the isolates but exceed previous typing resolution. In conclusion, we have created a powerful typing scheme, hence providing an international standardized nomenclature that is suitable for surveillance approaches in various sectors, linking public health, veterinary public health, and food safety in a true One Health fashion.


2012 ◽  
Vol 56 (11) ◽  
pp. 5986-5989 ◽  
Author(s):  
Manoj Kumar ◽  
Tarun Mathur ◽  
Tarani K. Barman ◽  
G. Ramkumar ◽  
Ashish Bhati ◽  
...  

ABSTRACTThe MIC90of RBx 14255, a novel ketolide, againstClostridium difficilewas 4 μg/ml (MIC range, 0.125 to 8 μg/ml), and this drug was found to be more potent than comparator drugs. Anin vitrotime-kill kinetics study of RBx 14255 showed time-dependent bacterial killing forC. difficile. Furthermore, in the hamster model ofC. difficileinfection, RBx 14255 demonstrated greater efficacy than metronidazole and vancomycin, making it a promising candidate forC. difficiletreatment.


2018 ◽  
Vol 62 (8) ◽  
Author(s):  
Alicia G. Beukers ◽  
Henrik Hasman ◽  
Kristin Hegstad ◽  
Sebastiaan J. van Hal

ABSTRACT Mutations associated with linezolid resistance within the V domain of 23S rRNA are annotated using an Escherichia coli numbering system. The 23S rRNA gene varies in length, nucleotide sequence, and copy number among bacterial species. Consequently, this numbering system is not intuitive and can lead to confusion when mutation sites are being located using whole-genome sequencing data. Using the mutation G2576T as an example, we demonstrate the difficulties associated with using the E. coli numbering system.


2020 ◽  
Vol 202 (24) ◽  
Author(s):  
Kevin Y. H. Liang ◽  
Fabini D. Orata ◽  
Mohammad Tarequl Islam ◽  
Tania Nasreen ◽  
Munirul Alam ◽  
...  

ABSTRACT Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/). IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae. Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae. In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.


2015 ◽  
Vol 54 (3) ◽  
pp. 775-778 ◽  
Author(s):  
Tracy McMillen ◽  
Mini Kamboj ◽  
N. Esther Babady

Clostridium difficile027/NAP1/BI is the most commonC. difficilestrain in the United States. The XpertC. difficile/Epi assay allows rapid, presumptive identification ofC. difficileNAP1. We compared XpertC. difficile/Epi to multilocus sequence typing for identification ofC. difficileNAP1 and found “very good” agreement at 97.9% (κ = 0.86; 95% confidence interval, 0.80 to 0.91).


2018 ◽  
Vol 56 (9) ◽  
Author(s):  
Anna Janowicz ◽  
Fabrizio De Massis ◽  
Massimo Ancora ◽  
Cesare Cammà ◽  
Claudio Patavino ◽  
...  

ABSTRACT The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ≤6 loci in the cgMLST and ≤7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.


Sign in / Sign up

Export Citation Format

Share Document