Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences
ABSTRACTEscherichia coliandShigellaspecies are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species ofShigellaare therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982Escherichia coliandShigellasp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasiveE. coliisolates that were misidentified asShigella flexneriorS. boydiiby the kmer ID, and 8 wereS. flexneriisolates misidentified by TB&S asS. boydiidue to nonfunctionalS. flexneriO antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising bothS. boydiiandS. dysenteriaestrains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data.Shigellacan be differentiated fromE. coliand accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species ofShigella, and identified emerging pathoadapted lineages.