scholarly journals A machine learning-based typing scheme refinement for Listeria monocytogenes core genome multilocus sequence typing with high discriminatory power for common source outbreak tracking

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260293
Author(s):  
Yen-Yi Liu ◽  
Chih-Chieh Chen

Background As whole-genome sequencing for pathogen genomes becomes increasingly popular, the typing methods of gene-by-gene comparison, such as core genome multilocus sequence typing (cgMLST) and whole-genome multilocus sequence typing (wgMLST), are being routinely implemented in molecular epidemiology. However, some intrinsic problems remain. For example, genomic sequences with varying read depths, read lengths, and assemblers influence the genome assemblies, introducing error or missing alleles into the generated allelic profiles. These errors and missing alleles might create “specious discrepancy” among closely related isolates, thus making accurate epidemiological interpretation challenging. In addition, the rapid growth of the cgMLST allelic profile database can cause problems related to storage and maintenance as well as long query search times. Methods We attempted to resolve these issues by decreasing the scheme size to reduce the occurrence of error and missing alleles, alleviate the storage burden, and improve the query search time. The challenge in this approach is maintaining the typing resolution when using fewer loci. We achieved this by using a popular artificial intelligence technique, XGBoost, coupled with Shapley additive explanations for feature selection. Finally, 370 loci from the original 1701 cgMLST loci of Listeria monocytogenes were selected. Results Although the size of the final scheme (LmScheme_370) was approximately 80% lower than that of the original cgMLST scheme, its discriminatory power, tested for 35 outbreaks, was concordant with that of the original cgMLST scheme. Although we used L. monocytogenes as a demonstration in this study, the approach can be applied to other schemes and pathogens. Our findings might help elucidate gene-by-gene–based epidemiology.

2017 ◽  
Vol 83 (15) ◽  
Author(s):  
Yi Chen ◽  
Yan Luo ◽  
Heather Carleton ◽  
Ruth Timme ◽  
David Melka ◽  
...  

ABSTRACT Epidemiological findings of a listeriosis outbreak in 2013 implicated Hispanic-style cheese produced by company A, and pulsed-field gel electrophoresis (PFGE) and whole genome sequencing (WGS) were performed on clinical isolates and representative isolates collected from company A cheese and environmental samples during the investigation. The results strengthened the evidence for cheese as the vehicle. Surveillance sampling and WGS 3 months later revealed that the equipment purchased by company B from company A yielded an environmental isolate highly similar to all outbreak isolates. The whole genome and core genome multilocus sequence typing and single nucleotide polymorphism (SNP) analyses results were compared to demonstrate the maximum discriminatory power obtained by using multiple analyses, which were needed to differentiate outbreak-associated isolates from a PFGE-indistinguishable isolate collected in a nonimplicated food source in 2012. This unrelated isolate differed from the outbreak isolates by only 7 to 14 SNPs, and as a result, the minimum spanning tree from the whole genome analyses and certain variant calling approach and phylogenetic algorithm for core genome-based analyses could not provide differentiation between unrelated isolates. Our data also suggest that SNP/allele counts should always be combined with WGS clustering analysis generated by phylogenetically meaningful algorithms on a sufficient number of isolates, and the SNP/allele threshold alone does not provide sufficient evidence to delineate an outbreak. The putative prophages were conserved across all the outbreak isolates. All outbreak isolates belonged to clonal complex 5 and serotype 1/2b and had an identical inlA sequence which did not have premature stop codons. IMPORTANCE In this outbreak, multiple analytical approaches were used for maximum discriminatory power. A PFGE-matched, epidemiologically unrelated isolate had high genetic similarity to the outbreak-associated isolates, with as few as 7 SNP differences. Therefore, the SNP/allele threshold should not be used as the only evidence to define the scope of an outbreak. It is critical that the SNP/allele counts be complemented by WGS clustering analysis generated by phylogenetically meaningful algorithms to distinguish outbreak-associated isolates from epidemiologically unrelated isolates. Careful selection of a variant calling approach and phylogenetic algorithm is critical for core-genome-based analyses. The whole-genome-based analyses were able to construct the highly resolved phylogeny needed to support the findings of the outbreak investigation. Ultimately, epidemiologic evidence and multiple WGS analyses should be combined to increase confidence levels during outbreak investigations.


2015 ◽  
Vol 61 (9) ◽  
pp. 637-646 ◽  
Author(s):  
Swapnil Doijad ◽  
Markus Weigel ◽  
Sukhadeo Barbuddhe ◽  
Jochen Blom ◽  
Alexander Goesmann ◽  
...  

The precise delineation of lineages and clonal groups are a prerequisite to examine within-species genetic variations, particularly with respect to pathogenic potential. A whole-genome-based approach was used to subtype and subgroup isolates of Listeria monocytogenes. Core-genome typing was performed, employing 3 different approaches: total core genes (CG), high-scoring segment pairs (HSPs), and average nucleotide identity (ANI). Examination of 113 L. monocytogenes genomes available in-house and in public domains revealed 33 phylogenomic groups (PGs). Each PG could be differentiated into a number of genomic types (GTs), depending on the approach used: HSPs (n = 57 GTs), CG (n = 71 GTs), and ANI (n = 83 GTs). Demarcation of the PGs was concordant with the 4 known lineages and led to the identification of sublineages in the lineage groups I, II, and III. In addition, PG assignments had discriminatory power similar to multi-virulence-locus sequence typing types and clonal complexes of multilocus sequence typing. Clustering of genomically highly similar isolates from different countries, sources, and isolation dates using whole-genome-based PG suggested that dispersion of phylogenomic clones of L. monocytogenes preceded their subsequent evolution. Classification according to PG may act as a guideline for future epidemiological studies.


2016 ◽  
Vol 82 (20) ◽  
pp. 6258-6272 ◽  
Author(s):  
Yi Chen ◽  
Narjol Gonzalez-Escalona ◽  
Thomas S. Hammack ◽  
Marc W. Allard ◽  
Errol A. Strain ◽  
...  

ABSTRACTMany listeriosis outbreaks are caused by a few globally distributed clonal groups, designated clonal complexes or epidemic clones, ofListeria monocytogenes, several of which have been defined by classic multilocus sequence typing (MLST) schemes targeting 6 to 8 housekeeping or virulence genes. We have developed and evaluated core genome MLST (cgMLST) schemes and applied them to isolates from multiple clonal groups, including those associated with 39 listeriosis outbreaks. The cgMLST clusters were congruent with MLST-defined clonal groups, which had various degrees of diversity at the whole-genome level. Notably, cgMLST could distinguish among outbreak strains and epidemiologically unrelated strains of the same clonal group, which could not be achieved using classic MLST schemes. The precise selection of cgMLST gene targets may not be critical for the general identification of clonal groups and outbreak strains. cgMLST analyses further identified outbreak strains, including those associated with recent outbreaks linked to contaminated French-style cheese, Hispanic-style cheese, stone fruit, caramel apple, ice cream, and packaged leafy green salad, as belonging to major clonal groups. We further developed lineage-specific cgMLST schemes, which can include accessory genes when core genomes do not possess sufficient diversity, and this provided additional resolution over species-specific cgMLST. Analyses of isolates from different common-source listeriosis outbreaks revealed various degrees of diversity, indicating that the numbers of allelic differences should always be combined with cgMLST clustering and epidemiological evidence to define a listeriosis outbreak.IMPORTANCEClassic multilocus sequence typing (MLST) schemes targeting internal fragments of 6 to 8 genes that define clonal complexes or epidemic clones have been widely employed to studyL. monocytogenesbiodiversity and its relation to pathogenicity potential and epidemiology. We demonstrated that core genome MLST schemes can be used for the simultaneous identification of clonal groups and the differentiation of individual outbreak strains and epidemiologically unrelated strains of the same clonal group. We further developed lineage-specific cgMLST schemes that targeted more genomic regions than the species-specific cgMLST schemes. Our data revealed the genome-level diversity of clonal groups defined by classic MLST schemes. Our identification of U.S. and international outbreaks caused by major clonal groups can contribute to further understanding of the global epidemiology ofL. monocytogenes.


Author(s):  
Sabine Lichtenegger ◽  
Trung T. Trinh ◽  
Karoline Assig ◽  
Karola Prior ◽  
Dag Harmsen ◽  
...  

Objectives: Burkholderia pseudomallei causes the severe disease melioidosis. Whole genome-sequencing (WGS) based typing methods currently offer the highest resolution for molecular investigations of this genetically diverse pathogen. Still, its routine application in diagnostic laboratories is limited by the need for high computing power, bioinformatic skills and variable bioinformatic approaches, the latter affecting the results. We therefore aimed to establish and validate a WGS-based core genome multilocus sequence typing (cgMLST) scheme, applicable in routine diagnostic settings. Methods: A soft defined core genome was obtained by challenging the B. pseudomallei reference genome K96243 with 469 environmental and clinical genomes, resulting in 4,221 core and 1,359 accessory targets. The scheme was validated with 320 WGS datasets. We compared our novel typing scheme with single nucleotide polymorphism based-approaches investigating closely and distantly related strains. Finally, we applied our scheme for tracking the environmental source of a recent infection. Results: The validation of the scheme detected >95% good cgMLST target genes in 98.4% of the genomes. Comparison with existing typing methods revealed very good concordance. Our scheme proved to be applicable to investigate not only closely related strains, but also the global B. pseudomallei population structure. We successfully utilized our scheme to identify a sugar cane field as the presumable source of a recent melioidosis case. Conclusion: We developed a robust cgMLST typing scheme which integrates high resolution, maximized standardization and fast analysis for the non-bioinformatician. Our typing scheme has the potential to serve as a routinely applicable classification system in B. pseudomallei molecular epidemiology.


2017 ◽  
Vol 6 (3) ◽  
Author(s):  
Federica Palma ◽  
Frédérique Pasquali ◽  
Alex Lucchi ◽  
Alessandra De Cesare ◽  
Gerardo Manfreda

Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.


2019 ◽  
Vol 57 (6) ◽  
Author(s):  
R. C. Jones ◽  
L. G. Harris ◽  
S. Morgan ◽  
M. C. Ruddy ◽  
M. Perry ◽  
...  

ABSTRACT An inability to standardize the bioinformatic data produced by whole-genome sequencing (WGS) has been a barrier to its widespread use in tuberculosis phylogenetics. The aim of this study was to carry out a phylogenetic analysis of tuberculosis in Wales, United Kingdom, using Ridom SeqSphere software for core genome multilocus sequence typing (cgMLST) analysis of whole-genome sequencing data. The phylogenetics of tuberculosis in Wales have not previously been studied. Sixty-six Mycobacterium tuberculosis isolates (including 42 outbreak-associated isolates) from south Wales were sequenced using an Illumina platform. Isolates were assigned to principal genetic groups, single nucleotide polymorphism (SNP) cluster groups, lineages, and sublineages using SNP-calling protocols. WGS data were submitted to the Ridom SeqSphere software for cgMLST analysis and analyzed alongside 179 previously lineage-defined isolates. The data set was dominated by the Euro-American lineage, with the sublineage composition being dominated by T, X, and Haarlem family strains. The cgMLST analysis successfully assigned 58 isolates to major lineages, and the results were consistent with those obtained by traditional SNP mapping methods. In addition, the cgMLST scheme was used to resolve an outbreak of tuberculosis occurring in the region. This study supports the use of a cgMLST method for standardized phylogenetic assignment of tuberculosis isolates and for outbreak resolution and provides the first insight into Welsh tuberculosis phylogenetics, identifying the presence of the Haarlem sublineage commonly associated with virulent traits.


Sign in / Sign up

Export Citation Format

Share Document