scholarly journals MentaLiST – A fast MLST caller for large MLST schemes

2017 ◽  
Author(s):  
Pedro Feijao ◽  
Hua-Ting Yao ◽  
Dan Fornika ◽  
Jennifer Gardy ◽  
Will Hsiao ◽  
...  

AbstractMLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing (WGS) data, MLST methods have evolved toward larger typing schemes, based on a few hundred genes (core genome MLST, cgMLST) to a few thousand genes (whole genome MLST, wgMLST). Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes.We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST scheme with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.

2014 ◽  
Vol 53 (1) ◽  
pp. 191-200 ◽  
Author(s):  
Walter Demczuk ◽  
Tarah Lynch ◽  
Irene Martin ◽  
Gary Van Domselaar ◽  
Morag Graham ◽  
...  

A large-scale, whole-genome comparison of CanadianNeisseria gonorrhoeaeisolates with high-level cephalosporin MICs was used to demonstrate a genomic epidemiology approach to investigate strain relatedness and dynamics. Although current typing methods have been very successful in tracing short-chain transmission of gonorrheal disease, investigating the temporal evolutionary relationships and geographical dissemination of highly clonal lineages requires enhanced resolution only available through whole-genome sequencing (WGS). Phylogenomic cluster analysis grouped 169 Canadian strains into 12 distinct clades. While someN. gonorrhoeaemultiantigen sequence types (NG-MAST) agreed with specific phylogenomic clades or subclades, other sequence types (ST) and closely related groups of ST were widely distributed among clades. Decreased susceptibility to extended-spectrum cephalosporins (ESC-DS) emerged among a group of diverse strains in Canada during the 1990s with a variety of nonmosaicpenAalleles, followed in 2000/2001 with thepenAmosaic X allele and then in 2007 with ST1407 strains with thepenAmosaic XXXIV allele. Five genetically distinct ESC-DS lineages were associated withpenAmosaic X, XXXV, and XXXIV alleles and nonmosaic XII and XIII alleles. ESC-DS with coresistance to azithromycin was observed in 5 strains with 23S rRNA C2599T or A2143G mutations. As the costs associated with WGS decline and analysis tools are streamlined, WGS can provide a more thorough understanding of strain dynamics, facilitate epidemiological studies to better resolve social networks, and improve surveillance to optimize treatment for gonorrheal infections.


2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.


Genes ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 687 ◽  
Author(s):  
Sheppard ◽  
Groves ◽  
Andrews ◽  
Litt ◽  
Fry ◽  
...  

We used whole genome sequencing (WGS) analysis to investigate the population structure of 877 Streptococcus pneumoniae isolates from five carriage studies from 2002 (N = 346), 2010 (N = 127), 2013 (N = 153), 2016 (N = 187) and 2018 (N = 64) in UK households which covers the period pre-PCV7 to post-PCV13 implementation. The genomic lineages seen in the population were determined using multi-locus sequence typing (MLST) and PopPUNK (Population Partitioning Using Nucleotide K-mers) which was used for local and global comparisons. A Roary core genome alignment of all the carriage genomes was used to investigate phylogenetic relationships between the lineages. The results showed an influx of previously undetected sequence types after vaccination associated with non-vaccine serotypes. A small number of lineages persisted throughout, associated with both non-vaccine and vaccine types (such as ST199), or that could be an example of serotype switching from vaccine to non-vaccine types (ST177). Serotype 3 persisted throughout the study years, represented by ST180 and Global Pneumococcal Sequencing Cluster (GPSC) 12; the local PopPUNK analysis and core genome maximum likelihood phylogeny separated them into two clades, one of which is only seen in later study years. The genomic data showed that serotype replacement in the carriage studies was mostly due to a change in genotype as well as serotype, but that some important genetic lineages, previously associated with vaccine types, persisted.


Author(s):  
Yu. O. Goncharova ◽  
I. V. Bakhteeva ◽  
R. I. Mironova ◽  
A. G. Bogun ◽  
K. V. Khlopova ◽  
...  

Objective – genotyping by multilocus sequence-typing (MLST) and phylogenetic analysis of 40 Bacillus anthracis strains isolated in Russia and neighboring countries.Materials and methods. In this study, the sequences of seven housekeeping genes of B. anthracis strains were assembled based on the data of a whole genome new generation sequencing, after which the identified mutations and their coordinates were described. The obtained sequences were used for genotyping of the investigated sample using the MLST method. The results are compared with the data presented in PubMLST database. A phylogenetic analysis was performed for the in silico fused sequences of the seven loci of the identified sequence types. The MEGA 7.0 software package was used to build the dendrograms.Results and discussion. Two sequence types (ST) have been found in the examined sample: 35 strains belong to ST-1, and five strains that differed by one common mutation at the glpF locus – to ST-3 (according to PubMLST coding), which emphasizes the genetic separation of this group of strains. One strain has a unique mutation in the gmk gene located outside the region used for MLST. 


2021 ◽  
Vol 12 ◽  
Author(s):  
Min He ◽  
Tao Lei ◽  
Fufeng Jiang ◽  
Jumei Zhang ◽  
Haiyan Zeng ◽  
...  

Vibrio parahaemolyticus is a common foodborne pathogen that causes gastroenteritis worldwide. Determining its prevalence and genetic diversity will minimize the risk of infection and the associated economic burden. Multilocus sequence typing (MLST) is an important tool for molecular epidemiology and population genetic studies of bacteria. Here, we analyzed the genetic and evolutionary relationships of 162 V. parahaemolyticus strains isolated in the Guangdong Province, China, using MLST. In the study, 120 strains were isolated from food samples, and 42 strains were isolated from clinical samples. All strains were categorized into 100 sequence types (STs), of which 58 were novel (48 from the food isolates and 10 from the clinical isolates). ST415 was the most prevalent ST among the food isolates, while ST3 was the most prevalent ST among the clinical isolates. Further, 12 clonal complexes, 14 doublets, and 73 singletons were identified in all ST clusters, indicating high genetic diversity of the analyzed strains. At the concatenated sequence level, non-synonymous sites in both, food and clinical isolates, were associated with purifying selection. Of note, the dN/dS ration was greater than 1 for some housekeeping genes in all isolates. This is the first time that some loci under positive selection were identified. These observations confirm frequent recombination events in V. parahaemolyticus. Recombination was much more important than mutation for genetic heterogeneity of the food isolates, but the probabilities of recombination and mutations were almost equal for the clinical isolates. Based on the phylogenetic analysis, the clinical isolates were concentrated in the maximum-likelihood tree, while the food isolates were heterogeneously distributed. In conclusion, the food and clinical isolates of V. parahaemolyticus from the Guangdong Province are similar, but show different evolutionary trends. This may help prevent large-scale spread of highly virulent strains and provides a genetic basis for the discovery of microevolutionary relationships in V. parahaemolyticus populations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Shigan Yan ◽  
Wencheng Zhang ◽  
Chengyu Li ◽  
Xu Liu ◽  
Liping Zhu ◽  
...  

Salmonella enterica (S. enterica) is an important foodborne pathogen, causing food poisoning and human infection, and critically threatening food safety and public health. Salmonella typing is essential for bacterial identification, tracing, epidemiological investigation, and monitoring. Serotyping and multilocus sequence typing (MLST) analysis are standard bacterial typing methods despite the low resolution. Core genome MLST (cgMLST) is a high-resolution molecular typing method based on whole genomic sequencing for accurate bacterial tracing. We investigated 250 S. enterica isolates from poultry, livestock, food, and human sources in nine provinces of China from 2004 to 2019 using serotyping, MLST, and cgMLST analysis. All S. enterica isolates were divided into 36 serovars using slide agglutination. The major serovars in order were Enteritidis (31 isolates), Typhimurium (29 isolates), Mbandaka (23 isolates), and Indiana (22 isolates). All strains were assigned into 43 sequence types (STs) by MLST. Among them, ST11 (31 isolates) was the primary ST. Besides this, a novel ST, ST8016, was identified, and it was different from ST40 by position 317 C → T in dnaN. Furthermore, these 250 isolates were grouped into 185 cgMLST sequence types (cgSTs) by cgMLST. The major cgST was cgST235530 (11 isolates), and only three cgSTs contained isolates from human and other sources, indicating a possibility of cross-species infection. Phylogenetic analysis indicated that most of the same serovar strains were putatively homologous except Saintpaul and Derby due to their multilineage characteristics. In addition, serovar I 4,[5],12:i:- and Typhimurium isolates have similar genomic relatedness on the phylogenetic tree. In conclusion, we sorted out the phenotyping and genotyping diversity of S. enterica isolates in China during 2004–2019 and clarified the temporal and spatial distribution characteristics of Salmonella from different hosts in China in the recent 16 years. These results greatly supplement Salmonella strain resources, genetic information, and traceability typing data; facilitate the typing, traceability, identification, and genetic evolution analysis of Salmonella; and therefore, improve the level of analysis, monitoring, and controlling of foodborne microorganisms in China.


2021 ◽  
Author(s):  
Carla Palacios-Gorba ◽  
Alexandra MOURA ◽  
Jesús Gomis ◽  
Alexandre Leclercq ◽  
Ángel Gómez-Martín ◽  
...  

The increasing prevalence of Listeria monocytogenes infections is a public health issue. Although studies have shown that ruminants constitute reservoirs of this foodborne pathogen, little is known about its epidemiology and genetic diversity within ruminant farms. Here we conducted a large-scale genomic and epidemiologic longitudinal study of Listeria spp. in dairy ruminants and their environments, comprising 19 farms monitored for three consecutive seasons (N=3251 samples). L. innocua was the most prevalent Listeria spp, followed by L. monocytogenes. L. monocytogenes was detected in 52.6% of farms (prevalence in feces samples 3.8%, in farm environment samples 2.5%) and more frequently in cattle (4.1%) and sheep (4.5%) than in goat farms (0.2%). Lineage I accounted for 69% of L. monocytogenes isolates. Among animal samples, the most prevalent sublineages (SL) and clonal complexes (CC) were SL1/CC1, SL219/CC4, SL26/CC26 and SL87/CC87, whereas SL666/CC666 was prevalent in environmental samples. 61 different L. monocytogenes CTs (cgMLST sequence types) were found, 17 of them (27.9%) common to different animals and/or surfaces within the same farms. L. monocytogenes prevalence was not affected by farm hygiene but by season: the overall prevalence of L. monocytogenes in cattle farms was higher during winter, and in sheep farms was higher during winter and spring. Cows in their second lactation had a higher probability of L. monocytogenes fecal shedding than other lactating cows. This study highlights that dairy farms constitute a reservoir for hypervirulent L. monocytogenes and the importance of continuous animal surveillance to reduce the burden of human listeriosis.


2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.


2020 ◽  
Vol 21 (24) ◽  
pp. 9419
Author(s):  
Kinga Wieczorek ◽  
Arkadiusz Bomba ◽  
Jacek Osek

Listeria monocytogenes, an important foodborne pathogen, may be present in different kinds of food and in food processing environments where it can persist for a long time. In this study, 28 L. monocytogenes isolates from fish and fish manufactures were characterized by whole genome sequencing (WGS). Core genome multilocus sequence typing (cgMLST) analysis was applied to compare the present isolates with publicly available genomes of L. monocytogenes strains recovered worldwide from food and from humans with listeriosis. All but one (96.4%) of the examined isolates belonged to molecular serogroup IIa, and one isolate (3.6%) was classified to serogroup IVb. The isolates of group IIa were mainly of MLST sequence types ST121 (13 strains) and ST8 (four strains) whereas the isolate of serogroup IVb was classified to ST1. Strains of serogroup IIa were further subtyped into eight different sublineages with the most numerous being SL121 (13; 48.1% strains) which belonged to six cgMLST types. The majority of strains, irrespective of the genotypic subtype, had the same antimicrobial resistance profile. The cluster analysis identified several molecular clones typical for L. monocytogenes isolated from similar sources in other countries; however, novel molecular cgMLST types not present in the Listeria database were also identified.


2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.org/iferres/MLSTar.


Sign in / Sign up

Export Citation Format

Share Document