Development and evaluation of a core genome multilocus sequence typing (cgMLST) scheme for Brucella spp.

The environmental bacterium Pseudomonas aeruginosa, in particular multidrug resistant clones, is often associated with nosocomial infections and outbreaks. Today, core genome multilocus sequence typing (cgMLST) is frequently applied to delineate sporadic cases from nosocomial transmissions. However, until recently, no cgMLST scheme for a standardized typing of P. aeruginosa was available.To establish a novel cgMLST scheme for P. aeruginosa, we initially determined the breadth of the P. aeruginosa population based on MLST data with a Bayesian approach (BAPS). Using genomic data of representative isolates for the whole population and for all 12 serogroups, we extracted target genes and further refined them using a random dataset of 1,000 P. aeruginosa genomes. Subsequently, we investigated reproducibility and discriminatory ability with repeatedly sequenced isolates and isolates from well-defined outbreak scenarios, respectively, and compared clustering applying two recently published cgMLST schemes.BAPS generated seven P. aeruginosa groups. To cover these and all serogroups, 15 reference strains were used to determine genes common in all strains. After refinement with the dataset of 1,000 genomes, the cgMLST scheme consisted of 3,867 target genes, which are representative for the P. aeruginosa population and highly reproducible using biological replicates. We finally evaluated the scheme by reanalyzing two published outbreaks, where the authors used single nucleotide polymorphisms (SNPs) typing. In both cases cgMLST was concordant to the previous SNP results and to the results of the two other cgMLST schemes.In conclusion, the highly-reproducible novel P. aeruginosa cgMLST scheme facilitates outbreak investigations due to the publicly available cgMLST nomenclature.

Download Full-text

Core Genome Allelic Profiles of Clinical Klebsiella pneumoniae Strains Using a Random Forest Algorithm Based on Multilocus Sequence Typing Scheme for Hypervirulence Analysis

The Journal of Infectious Diseases ◽

10.1093/infdis/jiz562 ◽

2020 ◽

Vol 221 (Supplement_2) ◽

pp. S263-S271 ◽

Cited By ~ 1

Author(s):

Peng Lan ◽

Qiucheng Shi ◽

Ping Zhang ◽

Yan Chen ◽

Rushuang Yan ◽

...

Keyword(s):

Random Forest ◽

Klebsiella Pneumoniae ◽

Multilocus Sequence Typing ◽

Operating Characteristic ◽

Core Genome ◽

Characteristic Curve ◽

Random Forest Algorithm ◽

The Core ◽

Model Based ◽

Operating Characteristic Curve

Abstract Background Hypervirulent Klebsiella pneumoniae (hvKP) infections can have high morbidity and mortality rates owing to their invasiveness and virulence. However, there are no effective tools or biomarkers to discriminate between hvKP and nonhypervirulent K. pneumoniae (nhvKP) strains. We aimed to use a random forest algorithm to predict hvKP based on core-genome data. Methods In total, 272 K. pneumoniae strains were collected from 20 tertiary hospitals in China and divided into hvKP and nhvKP groups according to clinical criteria. Clinical data comparisons, whole-genome sequencing, virulence profile analysis, and core genome multilocus sequence typing (cgMLST) were performed. We then established a random forest predictive model based on the cgMLST scheme to prospectively identify hvKP. The random forest is an ensemble learning method that generates multiple decision trees during the training process and each decision tree will output its own prediction results corresponding to the input. The predictive ability of the model was assessed by means of area under the receiver operating characteristic curve. Results Patients in the hvKP group were younger than those in the nhvKP group (median age, 58.0 and 68.0 years, respectively; P < .001). More patients in the hvKP group had underlying diabetes mellitus (43.1% vs 20.1%; P < .001). Clinically, carbapenem-resistant K. pneumoniae was less common in the hvKP group (4.1% vs 63.8%; P < .001), whereas the K1/K2 serotype, sequence type (ST) 23, and positive string tests were significantly higher in the hvKP group. A cgMLST-based minimal spanning tree revealed that hvKP strains were scattered sporadically within nhvKP clusters. ST23 showed greater genome diversification than did ST11, according to cgMLST-based allelic differences. Primary virulence factors (rmpA, iucA, positive string test result, and the presence of virulence plasmid pLVPK) were poor predictors of the hypervirulence phenotype. The random forest model based on the core genome allelic profile presented excellent predictive power, both in the training and validating sets (area under receiver operating characteristic curve, 0.987 and 0.999 in the training and validating sets, respectively). Conclusions A random forest algorithm predictive model based on the core genome allelic profiles of K. pneumoniae was accurate to identify the hypervirulent isolates.

Download Full-text

Epidemiological investigation of an Acinetobacter baumannii outbreak using core genome multilocus sequence typing

Journal of Global Antimicrobial Resistance ◽

10.1016/j.jgar.2018.11.027 ◽

2019 ◽

Vol 17 ◽

pp. 245-249 ◽

Cited By ~ 3

Author(s):

Carolina Venditti ◽

Antonella Vulcano ◽

Silvia D’Arezzo ◽

Cesare Ernesto Maria Gruber ◽

Marina Selleri ◽

...

Keyword(s):

Acinetobacter Baumannii ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Epidemiological Investigation

Download Full-text

Genus-wide Leptospira core genome multilocus sequence typing for strain taxonomy and global surveillance

PLoS Neglected Tropical Diseases ◽

10.1371/journal.pntd.0007374 ◽

2019 ◽

Vol 13 (4) ◽

pp. e0007374 ◽

Cited By ~ 21

Author(s):

Julien Guglielmini ◽

Pascale Bourhy ◽

Olivier Schiettekatte ◽

Farida Zinini ◽

Sylvain Brisse ◽

...

Keyword(s):

Multilocus Sequence Typing ◽

Core Genome

Download Full-text

A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera

Journal of Bacteriology ◽

10.1128/jb.00086-20 ◽

2020 ◽

Vol 202 (24) ◽

Cited By ~ 2

Author(s):

Kevin Y. H. Liang ◽

Fabini D. Orata ◽

Mohammad Tarequl Islam ◽

Tania Nasreen ◽

Munirul Alam ◽

...

Keyword(s):

Vibrio Cholerae ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Geographic Origin ◽

Data Set ◽

Epidemiological Research ◽

Content Type ◽

Typing Scheme ◽

Subspecies Level ◽

Allelic Differences

ABSTRACT Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/). IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae. Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae. In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.

Download Full-text

Using Core-genome Multilocus Sequence Typing to Monitor the Changing Epidemiology of Methicillin-resistantStaphylococcus aureusin a Teaching Hospital

Clinical Infectious Diseases ◽

10.1093/cid/ciy644 ◽

2018 ◽

Vol 67 (suppl_2) ◽

pp. S241-S248 ◽

Cited By ~ 4

Author(s):

Yan Chen ◽

Lu Sun ◽

Dandan Wu ◽

Haiping Wang ◽

Shujuan Ji ◽

...

Keyword(s):

Teaching Hospital ◽

Multilocus Sequence Typing ◽

Core Genome

Download Full-text

Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analysis in the Epidemiology of Brucella melitensis Infections

Journal of Clinical Microbiology ◽

10.1128/jcm.00517-18 ◽

2018 ◽

Vol 56 (9) ◽

Cited By ~ 15

Author(s):

Anna Janowicz ◽

Fabrizio De Massis ◽

Massimo Ancora ◽

Cesare Cammà ◽

Claudio Patavino ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Brucella Melitensis ◽

Reference Sequence ◽

Snp Analysis ◽

Phylogenetic Distance ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Content Type

ABSTRACT The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ≤6 loci in the cgMLST and ≤7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.

Download Full-text

Whole Genome and Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analyses of Listeria monocytogenes Isolates Associated with an Outbreak Linked to Cheese, United States, 2013

Applied and Environmental Microbiology ◽

10.1128/aem.00633-17 ◽

2017 ◽

Vol 83 (15) ◽

Cited By ~ 36

Author(s):

Yi Chen ◽

Yan Luo ◽

Heather Carleton ◽

Ruth Timme ◽

David Melka ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Clustering Analysis ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Variant Calling ◽

Discriminatory Power ◽

Sufficient Evidence ◽

Whole Genome ◽

Nucleotide Polymorphism ◽

Single Nucleotide

ABSTRACT Epidemiological findings of a listeriosis outbreak in 2013 implicated Hispanic-style cheese produced by company A, and pulsed-field gel electrophoresis (PFGE) and whole genome sequencing (WGS) were performed on clinical isolates and representative isolates collected from company A cheese and environmental samples during the investigation. The results strengthened the evidence for cheese as the vehicle. Surveillance sampling and WGS 3 months later revealed that the equipment purchased by company B from company A yielded an environmental isolate highly similar to all outbreak isolates. The whole genome and core genome multilocus sequence typing and single nucleotide polymorphism (SNP) analyses results were compared to demonstrate the maximum discriminatory power obtained by using multiple analyses, which were needed to differentiate outbreak-associated isolates from a PFGE-indistinguishable isolate collected in a nonimplicated food source in 2012. This unrelated isolate differed from the outbreak isolates by only 7 to 14 SNPs, and as a result, the minimum spanning tree from the whole genome analyses and certain variant calling approach and phylogenetic algorithm for core genome-based analyses could not provide differentiation between unrelated isolates. Our data also suggest that SNP/allele counts should always be combined with WGS clustering analysis generated by phylogenetically meaningful algorithms on a sufficient number of isolates, and the SNP/allele threshold alone does not provide sufficient evidence to delineate an outbreak. The putative prophages were conserved across all the outbreak isolates. All outbreak isolates belonged to clonal complex 5 and serotype 1/2b and had an identical inlA sequence which did not have premature stop codons. IMPORTANCE In this outbreak, multiple analytical approaches were used for maximum discriminatory power. A PFGE-matched, epidemiologically unrelated isolate had high genetic similarity to the outbreak-associated isolates, with as few as 7 SNP differences. Therefore, the SNP/allele threshold should not be used as the only evidence to define the scope of an outbreak. It is critical that the SNP/allele counts be complemented by WGS clustering analysis generated by phylogenetically meaningful algorithms to distinguish outbreak-associated isolates from epidemiologically unrelated isolates. Careful selection of a variant calling approach and phylogenetic algorithm is critical for core-genome-based analyses. The whole-genome-based analyses were able to construct the highly resolved phylogeny needed to support the findings of the outbreak investigation. Ultimately, epidemiologic evidence and multiple WGS analyses should be combined to increase confidence levels during outbreak investigations.

Download Full-text