scholarly journals MLSTar: automatic multilocus sequence typing of bacterial genomes in R

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5098 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci which allows us to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has allowed us to apply and extend MLST schemes by automatic extraction of allele information from the genomes. The PubMLST database is the most comprehensive resource of described schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows us to (i) connect with the PubMLST database to select a target scheme, (ii) screen a desired set of genomes to assign alleles and sequence types, and (iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze more than 2,500 bacterial genomes from different species, showing great accuracy, and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.

2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.


2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.


2018 ◽  
Author(s):  
Ignacio Ferrés ◽  
Gregorio Iraola

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.org/iferres/MLSTar.


2011 ◽  
Vol 57 (12) ◽  
pp. 982-986 ◽  
Author(s):  
Michelle L. Shuel ◽  
Kathleen E. Karlowsky ◽  
Dennis K.S. Law ◽  
Raymond S.W. Tsang

Population biology of Haemophilus influenzae can be studied by multilocus sequence typing (MLST), and isolates are assigned sequence types (STs) based on nucleotide sequence variations in seven housekeeping genes, including fucK. However, the ST cannot be assigned if one of the housekeeping genes is absent or cannot be detected by the current protocol. Occasionally, strains of H. influenzae have been reported to lack the fucK gene. In this study, we examined the prevalence of this mutation among our collection of H. influenzae isolates. Of the 704 isolates studied, including 282 encapsulated and 422 nonencapsulated isolates, nine were not typeable by MLST owing to failure to detect the fucK gene. All nine fucK-negative isolates were nonencapsulated and belonged to various biotypes. DNA sequencing of the fucose operon region confirmed complete deletion of genes in the operon in seven of the nine isolates, while in the remaining two isolates, some of the genes were found intact or in parts. The significance of these findings is discussed.


Microbiology ◽  
2010 ◽  
Vol 156 (7) ◽  
pp. 2035-2045 ◽  
Author(s):  
Claudia Picozzi ◽  
Gaia Bonacina ◽  
Ileana Vigentini ◽  
Roberto Foschino

Lactobacillus sanfranciscensis is a lactic acid bacterium that characterizes the sourdough environment. The genetic differences of 24 strains isolated in different years from sourdoughs, mostly collected in Italy, were examined and compared by PFGE and multilocus sequence typing (MLST). The MLST scheme, based on the analysis of six housekeeping genes (gdh, gyrA, mapA, nox, pgmA and pta) was developed for this study. PFGE with the restriction enzyme ApaI proved to have higher discriminatory power, since it revealed 22 different pulsotypes, while 19 sequence types were recognized through MLST analysis. Notably, restriction profiles generated from three isolates collected from the same firm but in three consecutive years clustered in a single pulsotype and showed the same sequence type, emphasizing the fact that the main factors affecting the dominance of a strain are correlated with processing conditions and the manufacturing environment rather than the geographical area. All results indicated a limited recombination among genes and the presence of a clonal population in L. sanfranciscensis. The MLST scheme proposed in this work can be considered a useful tool for characterization of isolates and for in-depth examination of the strain diversity and evolution of this species.


2022 ◽  
Author(s):  
Mark Achtman ◽  
Zhemin Zhou ◽  
Jane Charlesworth ◽  
Laura A. Baxter

The definition of bacterial species is traditionally a taxonomic issue while defining bacterial populations is done with population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST alleles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate the ability of HierCC to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and revealed differences in vertical inheritance of O serogroups. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.


Author(s):  
Yu. O. Goncharova ◽  
I. V. Bakhteeva ◽  
R. I. Mironova ◽  
A. G. Bogun ◽  
K. V. Khlopova ◽  
...  

Objective – genotyping by multilocus sequence-typing (MLST) and phylogenetic analysis of 40 Bacillus anthracis strains isolated in Russia and neighboring countries.Materials and methods. In this study, the sequences of seven housekeeping genes of B. anthracis strains were assembled based on the data of a whole genome new generation sequencing, after which the identified mutations and their coordinates were described. The obtained sequences were used for genotyping of the investigated sample using the MLST method. The results are compared with the data presented in PubMLST database. A phylogenetic analysis was performed for the in silico fused sequences of the seven loci of the identified sequence types. The MEGA 7.0 software package was used to build the dendrograms.Results and discussion. Two sequence types (ST) have been found in the examined sample: 35 strains belong to ST-1, and five strains that differed by one common mutation at the glpF locus – to ST-3 (according to PubMLST coding), which emphasizes the genetic separation of this group of strains. One strain has a unique mutation in the gmk gene located outside the region used for MLST. 


Microbiology ◽  
2011 ◽  
Vol 157 (3) ◽  
pp. 727-738 ◽  
Author(s):  
Kana Tanigawa ◽  
Koichi Watanabe

Currently, the species Lactobacillus delbrueckii is divided into four subspecies, L. delbrueckii subsp. delbrueckii, L. delbrueckii subsp. bulgaricus, L. delbrueckii subsp. indicus and L. delbrueckii subsp. lactis. These classifications were based mainly on phenotypic identification methods and few studies have used genotypic identification methods. As a result, these subspecies have not yet been reliably delineated. In this study, the four subspecies of L. delbrueckii were discriminated by phenotype and by genotypic identification [amplified-fragment length polymorphism (AFLP) and multilocus sequence typing (MLST)] methods. The MLST method developed here was based on the analysis of seven housekeeping genes (fusA, gyrB, hsp60, ileS, pyrG, recA and recG). The MLST method had good discriminatory ability: the 41 strains of L. delbrueckii examined were divided into 34 sequence types, with 29 sequence types represented by only a single strain. The sequence types were divided into eight groups. These groups could be discriminated as representing different subspecies. The results of the AFLP and MLST analyses were consistent. The type strain of L. delbrueckii subsp. delbrueckii, YIT 0080T, was clearly discriminated from the other strains currently classified as members of this subspecies, which were located close to strains of L. delbrueckii subsp. lactis. The MLST scheme developed in this study should be a useful tool for the identification of strains of L. delbrueckii to the subspecies level.


Author(s):  
Signe Nedergaard ◽  
Anne B. Jensen ◽  
Dorte Haubek ◽  
Niels Nørskov-Lauritsen

We developed a multilocus sequence typing scheme (MLST) for Aggregatibacter actinomycetemcomitans based on seven housekeeping genes, adk , atpG , frdB , mdh , pgi , recA , and zwf . A total of 188 strains of seven serotypes were separated into 57 sequence types.


2010 ◽  
Vol 77 (2) ◽  
pp. 537-544 ◽  
Author(s):  
Daniel P. Keymer ◽  
Alexandria B. Boehm

ABSTRACTVibrio choleraeconsists of pathogenic strains that cause sporadic gastrointestinal illness or epidemic cholera disease and nonpathogenic strains that grow and persist in coastal aquatic ecosystems. Previous studies of disease-causing strains have shownV. choleraeto be a primarily clonal bacterial species, but isolates analyzed have been strongly biased toward pathogenic genotypes, while representing only a small sample of the vast diversity in environmental strains. In this study, we characterized homologous recombination and structure among 152 environmentalV. choleraeisolates and 13 other putativeVibrioisolates from coastal waters and sediments in central California, as well as four clinicalV. choleraeisolates, using multilocus sequence analysis of seven housekeeping genes. Recombinant regions were identified by at least three detection methods in 72% of ourV. choleraeisolates. Despite frequent recombination, significant linkage disequilibrium was still detected among theV. choleraesequence types. Incongruent but nonrandom associations were observed for maximum likelihood topologies from the individual loci. Overall, our estimated recombination rate inV. choleraeof 6.5 times the mutation rate is similar to those of other sexual bacteria and appears frequently enough to restrict selection from purging much of the neutral intraspecies diversity. These data suggest that frequent recombination amongV. choleraemay hinder the identification of ecotypes in this bacterioplankton population.


Sign in / Sign up

Export Citation Format

Share Document