scholarly journals Core Genome Multilocus Sequence Typing: a Standardized Approach for Molecular Typing of Mycoplasma gallisepticum

2017 ◽  
Vol 56 (1) ◽  
Author(s):  
Mostafa Ghanem ◽  
Leyi Wang ◽  
Yan Zhang ◽  
Scott Edwards ◽  
Amanda Lu ◽  
...  

ABSTRACT Mycoplasma gallisepticum is the most virulent and economically important Mycoplasma species for poultry worldwide. Currently, M. gallisepticum strain differentiation based on sequence analysis of 5 loci remains insufficient for accurate outbreak investigation. Recently, whole-genome sequences (WGS) of many human and animal pathogens have been successfully used for microbial outbreak investigations. However, the massive sequence data and the diverse properties of different genes within bacterial genomes results in a lack of standard reproducible methods for comparisons among M. gallisepticum whole genomes. Here, we proposed the development of a core genome multilocus sequence typing (cgMLST) scheme for M. gallisepticum strains and field isolates. For development of this scheme, a diverse collection of 37 M. gallisepticum genomes was used to identify cgMLST targets. A total of 425 M. gallisepticum conserved genes (49.85% of M. gallisepticum genome) were selected as core genome targets. A total of 81 M. gallisepticum genomes from 5 countries on 4 continents were typed using M. gallisepticum cgMLST. Analyses of phylogenetic trees generated by cgMLST displayed a high degree of agreement with geographical and temporal information. Moreover, the high discriminatory power of cgMLST allowed differentiation between M. gallisepticum strains of the same outbreak. M. gallisepticum cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation among M. gallisepticum isolates. cgMLST provides stable and expandable nomenclature, allowing for comparison and sharing of typing results among laboratories worldwide. cgMLST offers an opportunity to harness the tremendous power of next-generation sequencing technology in applied avian mycoplasma epidemiology at both local and global levels.

mSphere ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Shanshan Liu ◽  
Xiaoliang Li ◽  
Zhenfei Guo ◽  
Hongsheng Liu ◽  
Yu Sun ◽  
...  

ABSTRACT Streptococcus mutans is one of the primary pathogens responsible for the development of dental caries. Recent whole-genome sequencing (WGS)-based core genome multilocus sequence typing (cgMLST) approaches have been employed in epidemiological studies of specific human pathogens. However, this approach has not been reported in studies of S. mutans. Here, we therefore developed a cgMLST scheme for S. mutans. We surveyed 199 available S. mutans genomes as a means of identifying cgMLST targets, developing a scheme that incorporated 594 targets from the S. mutans UA159 reference genome. Sixty-eight sequence types (STs) were identified in this cgMLST scheme (cgSTs) in 80 S. mutans isolates from 40 children that were sequenced in this study, compared to 35 STs identified by multilocus sequence typing (MLST). Fifty-six cgSTs (82.35%) were associated with a single isolate based on our cgMLST scheme, which is significantly higher than in the MLST scheme (11.43%). In addition, 58.06% of all MLST profiles with ≥2 isolates were further differentiated by our cgMLST scheme. Topological analyses of the maximum likelihood phylogenetic trees revealed that our cgMLST scheme was more reliable than the MLST scheme. A minimum spanning tree of 145 S. mutans isolates from 10 countries developed based upon the cgMLST scheme highlighted the diverse population structure of S. mutans. This cgMLST scheme thus offers a new molecular typing method suitable for evaluating the epidemiological distribution of this pathogen and has the potential to serve as a benchmark for future global studies of the epidemiological nature of dental caries. IMPORTANCE Streptococcus mutans is regarded as a major pathogen responsible for the onset of dental caries. S. mutans can transmit among people, especially within families. In this study, we established a new epidemiological approach to S. mutans classification. This approach can effectively differentiate among closely related isolates and offers superior reliability relative to that of the traditional MLST molecular typing method. As such, it has the potential to better support effective public health strategies centered around this bacterium that are aimed at preventing and treating dental caries.


2017 ◽  
Vol 55 (6) ◽  
pp. 1682-1697 ◽  
Author(s):  
Narjol Gonzalez-Escalona ◽  
Keith A. Jolley ◽  
Elizabeth Reed ◽  
Jaime Martinez-Urtaza

ABSTRACTVibrio parahaemolyticusis an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database forV. parahaemolyticuswas created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power amongV. parahaemolyticusstrains using WGS data. We sequenced 92V. parahaemolyticusgenomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing aV. parahaemolyticuscgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization ofV. parahaemolyticusstrains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage.


2020 ◽  
Vol 202 (24) ◽  
Author(s):  
Kevin Y. H. Liang ◽  
Fabini D. Orata ◽  
Mohammad Tarequl Islam ◽  
Tania Nasreen ◽  
Munirul Alam ◽  
...  

ABSTRACT Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/). IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae. Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae. In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.


2014 ◽  
Vol 53 (1) ◽  
pp. 35-42 ◽  
Author(s):  
Miquette Hall ◽  
Marie A. Chattaway ◽  
Sandra Reuter ◽  
Cyril Savin ◽  
Eckhard Strauch ◽  
...  

The genusYersiniais a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identifyYersiniastrains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic speciesYersinia enterocoliticato whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification ofYersiniaisolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available forY. enterocolitica.


2018 ◽  
Vol 56 (9) ◽  
Author(s):  
Anna Janowicz ◽  
Fabrizio De Massis ◽  
Massimo Ancora ◽  
Cesare Cammà ◽  
Claudio Patavino ◽  
...  

ABSTRACT The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ≤6 loci in the cgMLST and ≤7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.


2019 ◽  
Vol 58 (1) ◽  
Author(s):  
David W. Eyre ◽  
Tim E. A. Peto ◽  
Derrick W. Crook ◽  
A. Sarah Walker ◽  
Mark H. Wilcox

ABSTRACT Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.


2020 ◽  
Vol 58 (9) ◽  
Author(s):  
Richard A. Stanton ◽  
Gillian McAllister ◽  
Jonathan B. Daniels ◽  
Erin Breaker ◽  
Nicholas Vlachos ◽  
...  

ABSTRACT Pseudomonas aeruginosa is an opportunistic human pathogen that frequently causes health care-associated infections (HAIs). Due to its metabolic diversity and ability to form biofilms, this Gram-negative nonfermenting bacterium can persist in the health care environment, which can lead to prolonged HAI outbreaks. We describe the creation of a core genome multilocus sequence typing (cgMLST) scheme to provide a stable platform for the rapid comparison of P. aeruginosa isolates using whole-genome sequencing (WGS) data. We used a diverse set of 58 complete P. aeruginosa genomes to curate a set of 4,440 core genes found in each isolate, representing ∼64% of the average genome size. We then expanded the alleles for each gene using 1,991 contig-level genome sequences. The scheme was used to analyze genomes from four historical HAI outbreaks to compare the phylogenies generated using cgMLST to those of other means (traditional MLST, pulsed-field gel electrophoresis [PFGE], and single-nucleotide variant [SNV] analysis). The cgMLST scheme provides sufficient resolution for analyzing individual outbreaks, as well as the stability for comparisons across a variety of isolates encountered in surveillance studies, making it a valuable tool for the rapid analysis of P. aeruginosa genomes.


2021 ◽  
Vol 9 (1) ◽  
pp. 191
Author(s):  
Iliana Guardiola-Avila ◽  
Leonor Sánchez-Busó ◽  
Evelia Acedo-Félix ◽  
Bruno Gomez-Gil ◽  
Manuel Zúñiga-Cabrera ◽  
...  

Vibrio mimicus is an emerging pathogen, mainly associated with contaminated seafood consumption. However, little is known about its evolution, biodiversity, and pathogenic potential. This study analyzes the pan-, core, and accessory genomes of nine V. mimicus strains. The core genome yielded 2424 genes in chromosome I (ChI) and 822 genes in chromosome II (ChII), with an accessory genome comprising an average of 10.9% of the whole genome for ChI and 29% for ChII. Core genome phylogenetic trees were obtained, and V. mimicus ATCC-33654 strain was the closest to the outgroup in both chromosomes. Additionally, a phylogenetic study of eight conserved genes (ftsZ, gapA, gyrB, topA, rpoA, recA, mreB, and pyrH), including Vibrio cholerae, Vibrio parilis, Vibrio metoecus, and Vibrio caribbenthicus, clearly showed clade differentiation. The main virulence genes found in ChI corresponded with type I secretion proteins, extracellular components, flagellar proteins, and potential regulators, while, in ChII, the main categories were type-I secretion proteins, chemotaxis proteins, and antibiotic resistance proteins. The accessory genome was characterized by the presence of mobile elements and toxin encoding genes in both chromosomes. Based on the genome atlas, it was possible to characterize differential regions between strains. The pan-genome of V. mimicus encompassed 3539 genes for ChI and 2355 genes for ChII. These results give us an insight into the virulence and gene content of V. mimicus, as well as constitute the first approach to its diversity.


2014 ◽  
Vol 58 (7) ◽  
pp. 3895-3903 ◽  
Author(s):  
Alessandra Carattoli ◽  
Ea Zankari ◽  
Aurora García-Fernández ◽  
Mette Voldby Larsen ◽  
Ole Lund ◽  
...  

ABSTRACTIn the work presented here, we designed and developed two easy-to-use Web tools forin silicodetection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the familyEnterobacteriaceae. These tools will facilitate bacterial typing based on draft genomes of multidrug-resistantEnterobacteriaceaespecies by the rapid detection of known plasmid types. Replicon sequences from 559 fully sequenced plasmids associated with the familyEnterobacteriaceaein the NCBI nucleotide database were collected to build a consensus database for integration into a Web tool called PlasmidFinder that can be used for replicon sequence analysis of raw, contig group, or completely assembled and closed plasmid sequencing data. The PlasmidFinder database currently consists of 116 replicon sequences that match with at least at 80% nucleotide identity all replicon sequences identified in the 559 fully sequenced plasmids. For plasmid multilocus sequence typing (pMLST) analysis, a database that is updated weekly was generated fromwww.pubmlst.organd integrated into a Web tool called pMLST. Both databases were evaluated using draft genomes from a collection ofSalmonella entericaserovar Typhimurium isolates. PlasmidFinder identified a total of 103 replicons and between zero and five different plasmid replicons within each of 49S. Typhimurium draft genomes tested. The pMLST Web tool was able to subtype genomic sequencing data of plasmids, revealing both known plasmid sequence types (STs) and new alleles and ST variants. In conclusion, testing of the two Web tools using both fully assembled plasmid sequences and WGS-generated draft genomes showed them to be able to detect a broad variety of plasmids that are often associated with antimicrobial resistance in clinically relevant bacterial pathogens.


2015 ◽  
Vol 54 (3) ◽  
pp. 593-612 ◽  
Author(s):  
Margaret A. Fitzpatrick ◽  
Egon A. Ozer ◽  
Alan R. Hauser

Acinetobacter baumanniifrequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinicalAcinetobacterisolates and analyzing a carbapenem-resistantA. baumannii(CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148Acinetobacter calcoaceticus-A. baumanniicomplex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed threeAcinetobacterspecies within this collection. Four majorA. baumanniiclonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguishedA. baumanniiisolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques forA. baumanniistrain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts.


Sign in / Sign up

Export Citation Format

Share Document