Background
The laboratory surveillance of bacillary dysentery is based on a Shigella typing scheme standardised in the late 1940s. This scheme classifies Shigella strains into four serogroups and more than 50 serotypes on the basis of biochemical tests and lipopolysaccharide O-antigen serotyping. Real-time genomic surveillance of Shigella infections has been implemented in several countries, but without the use of a standardised high-resolution typing scheme.
Methods
We studied over 4,000 clinical isolates and reference strains of Shigella, covering all serotypes, including provisional serotypes and atypical strains, with the current serotyping scheme. These strains and isolates were also subjected to whole-genome sequencing and analysis with the EnteroBase Escherichia/Shigella 2,513-locus core-genome multilocus sequence typing scheme (cgMLST).
Findings
The Shigella genomes were grouped into eight phylogenetically distinct clusters, within the E. coli species. Three of these clusters contained strains from different serogroups and serotypes, the remaining five each consisting of a single serotype. The cgMLST hierarchical clustering (HC) analysis at different levels of resolution (HC2000 to HC400) recognised the natural groupings for Shigella. By contrast, the serotyping scheme was affected by horizontal gene transfer, leading to a conflation of genetically unrelated Shigella strains and a separation of some genetically related strains. We also curated the various provisional serotypes reported in the literature and described five new Shigella serotypes for addition to the typing scheme.
Interpretation
The EnteroBase Escherichia/Shigella cgMLST is a standardised, robust, portable, and high-resolution scheme that will enhance the laboratory surveillance of Shigella infections, particularly for Shigella flexneri. However, cgMLST data should be considered together with in silico serotyping data, to maintain backward compatibility with the current Shigella serotyping scheme.