MLSTar: automatic multilocus and core genome sequence typing in R

10.7287/peerj.preprints.26630 ◽

2018 ◽

Author(s):

Ignacio Ferrés ◽

Gregorio Iraola

Keyword(s):

Core Genome ◽

Bacterial Species ◽

Housekeeping Genes ◽

R Package ◽

Great Accuracy ◽

Campylobacter Coli ◽

Gene Sets ◽

Standard Tool ◽

Comparable Performance ◽

Sequence Types

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.

Download Full-text

MLSTar: automatic multilocus and core genome sequence typing in R

10.7287/peerj.preprints.26630v1 ◽

2018 ◽

Author(s):

Ignacio Ferrés ◽

Gregorio Iraola

Keyword(s):

Core Genome ◽

Bacterial Species ◽

Housekeeping Genes ◽

R Package ◽

Great Accuracy ◽

Campylobacter Coli ◽

Gene Sets ◽

Standard Tool ◽

Comparable Performance ◽

Sequence Types

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.org/iferres/MLSTar.

Download Full-text

MLSTar: automatic multilocus sequence typing of bacterial genomes in R

PeerJ ◽

10.7717/peerj.5098 ◽

2018 ◽

Vol 6 ◽

pp. e5098 ◽

Cited By ~ 6

Author(s):

Ignacio Ferrés ◽

Gregorio Iraola

Keyword(s):

Multilocus Sequence Typing ◽

Bacterial Species ◽

Housekeeping Genes ◽

R Package ◽

Great Accuracy ◽

Bacterial Genomes ◽

Standard Tool ◽

Comparable Performance ◽

R Packages ◽

Sequence Types

Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci which allows us to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has allowed us to apply and extend MLST schemes by automatic extraction of allele information from the genomes. The PubMLST database is the most comprehensive resource of described schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows us to (i) connect with the PubMLST database to select a target scheme, (ii) screen a desired set of genomes to assign alleles and sequence types, and (iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze more than 2,500 bacterial genomes from different species, showing great accuracy, and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.com/iferres/MLSTar.

Download Full-text

EnteroBase: Hierarchical clustering of 100,000s of bacterial genomes into species/sub-species and populations

10.1101/2022.01.11.475882 ◽

2022 ◽

Author(s):

Mark Achtman ◽

Zhemin Zhou ◽

Jane Charlesworth ◽

Laura A. Baxter

Keyword(s):

Core Genome ◽

Bacterial Species ◽

Automated Identification ◽

Bacterial Genomes ◽

Bacterial Populations ◽

Vertical Inheritance ◽

Definition Of ◽

Taxonomic Assignments ◽

Species Specific ◽

Sequence Types

The definition of bacterial species is traditionally a taxonomic issue while defining bacterial populations is done with population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST alleles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate the ability of HierCC to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and revealed differences in vertical inheritance of O serogroups. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.

Download Full-text

Comparative Analysis of New Zealand Campylobacter Isolates Using MLST, PFGE and flaA PCR RFLP Genotyping

10.26686/wgtn.16934848.v1 ◽

2021 ◽

Author(s):

◽

Sharla McTavish

Keyword(s):

New Zealand ◽

Bacterial Species ◽

Housekeeping Genes ◽

Species Boundaries ◽

Campylobacter Coli ◽

Closely Related Species ◽

Pcr Rflp ◽

Population Structures ◽

Human Campylobacteriosis ◽

Dominant Genotype

<p>Campylobacter jejuni and Campylobacter coli are the most commonly identified sources of campylobacteriosis in New Zealand, yet little is known about the distribution of genotypes within the respective population structures. Using multi-locus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and flaA genotyping, the current study identified the distribution of genotypes within New Zealand C. jejuni and C. coli isolates from an outbreak situation, as well as isolates present in the ESR Campylobacter collection. Although the most commonly identified MLST genotypes were similar to international genotypes, a number of internationally rare, or unique to New Zealand genotypes were observed. One rare dominant genotype, ST-474, arising from a point source outbreak, was found to cause a large proportion of human campylobacteriosis cases in New Zealand. A unique cluster of New Zealand genotypes were isolated only from river water, identifying a potentially water adapted C. jejuni strain. Frequent homologous recombination and horizontal gene transfer events were identified within the seven housekeeping genes characterised in the New Zealand sample and the MLST C. jejuni/C. coli database. The identified genetic instability within the current study questions the legitimacy of bacterial species boundaries, especially when examining closely related species such as C. jejuni and C. coli.</p>

Download Full-text

MentaLiST – A fast MLST caller for large MLST schemes

10.1101/172858 ◽

2017 ◽

Cited By ~ 2

Author(s):

Pedro Feijao ◽

Hua-Ting Yao ◽

Dan Fornika ◽

Jennifer Gardy ◽

Will Hsiao ◽

...

Keyword(s):

Large Scale ◽

Core Genome ◽

Foodborne Pathogen ◽

Simulated Data ◽

Housekeeping Genes ◽

Whole Genome ◽

Large Size ◽

Classic Technique ◽

Computational Resources ◽

Sequence Types

AbstractMLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing (WGS) data, MLST methods have evolved toward larger typing schemes, based on a few hundred genes (core genome MLST, cgMLST) to a few thousand genes (whole genome MLST, wgMLST). Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes.We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST scheme with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.

Download Full-text

Recombination Shapes the Structure of an EnvironmentalVibrio choleraePopulation

Applied and Environmental Microbiology ◽

10.1128/aem.02062-10 ◽

2010 ◽

Vol 77 (2) ◽

pp. 537-544 ◽

Cited By ~ 26

Author(s):

Daniel P. Keymer ◽

Alexandria B. Boehm

Keyword(s):

Bacterial Species ◽

Housekeeping Genes ◽

Small Sample ◽

Detection Methods ◽

Significant Linkage Disequilibrium ◽

Central California ◽

Intraspecies Diversity ◽

The Individual ◽

Significant Linkage ◽

Sequence Types

ABSTRACTVibrio choleraeconsists of pathogenic strains that cause sporadic gastrointestinal illness or epidemic cholera disease and nonpathogenic strains that grow and persist in coastal aquatic ecosystems. Previous studies of disease-causing strains have shownV. choleraeto be a primarily clonal bacterial species, but isolates analyzed have been strongly biased toward pathogenic genotypes, while representing only a small sample of the vast diversity in environmental strains. In this study, we characterized homologous recombination and structure among 152 environmentalV. choleraeisolates and 13 other putativeVibrioisolates from coastal waters and sediments in central California, as well as four clinicalV. choleraeisolates, using multilocus sequence analysis of seven housekeeping genes. Recombinant regions were identified by at least three detection methods in 72% of ourV. choleraeisolates. Despite frequent recombination, significant linkage disequilibrium was still detected among theV. choleraesequence types. Incongruent but nonrandom associations were observed for maximum likelihood topologies from the individual loci. Overall, our estimated recombination rate inV. choleraeof 6.5 times the mutation rate is similar to those of other sexual bacteria and appears frequently enough to restrict selection from purging much of the neutral intraspecies diversity. These data suggest that frequent recombination amongV. choleraemay hinder the identification of ecotypes in this bacterioplankton population.

Download Full-text

Comparative Analysis of New Zealand Campylobacter Isolates Using MLST, PFGE and flaA PCR RFLP Genotyping

10.26686/wgtn.16934848 ◽

2021 ◽

Author(s):

◽

Sharla McTavish

Keyword(s):

New Zealand ◽

Bacterial Species ◽

Housekeeping Genes ◽

Species Boundaries ◽

Campylobacter Coli ◽

Closely Related Species ◽

Pcr Rflp ◽

Population Structures ◽

Human Campylobacteriosis ◽

Dominant Genotype

<p>Campylobacter jejuni and Campylobacter coli are the most commonly identified sources of campylobacteriosis in New Zealand, yet little is known about the distribution of genotypes within the respective population structures. Using multi-locus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and flaA genotyping, the current study identified the distribution of genotypes within New Zealand C. jejuni and C. coli isolates from an outbreak situation, as well as isolates present in the ESR Campylobacter collection. Although the most commonly identified MLST genotypes were similar to international genotypes, a number of internationally rare, or unique to New Zealand genotypes were observed. One rare dominant genotype, ST-474, arising from a point source outbreak, was found to cause a large proportion of human campylobacteriosis cases in New Zealand. A unique cluster of New Zealand genotypes were isolated only from river water, identifying a potentially water adapted C. jejuni strain. Frequent homologous recombination and horizontal gene transfer events were identified within the seven housekeeping genes characterised in the New Zealand sample and the MLST C. jejuni/C. coli database. The identified genetic instability within the current study questions the legitimacy of bacterial species boundaries, especially when examining closely related species such as C. jejuni and C. coli.</p>

Download Full-text

Multi Locus Sequence Typing and spa Typing of Staphylococcus Aureus Isolated from the Milk of Cows with Subclinical Mastitis in Croatia

Microorganisms ◽

10.3390/microorganisms9040725 ◽

2021 ◽

Vol 9 (4) ◽

pp. 725

Author(s):

Luka Cvetnić ◽

Marko Samardžija ◽

Sanja Duvnjak ◽

Boris Habrun ◽

Marija Cvetnić ◽

...

Keyword(s):

Bacterial Species ◽

Housekeeping Genes ◽

Subclinical Mastitis ◽

Clinical Mastitis ◽

Multi Locus Sequence Typing ◽

Spa Typing ◽

Heterogenous Group ◽

Sequence Types ◽

Spa Types ◽

Frequent Allele

Background: The bacterial species S. aureus is the most common causative agent of mastitis in cows in most countries with a dairy industry. The prevalence of infection caused by S. aureus ranges from 2% to more than 50%, and it causes 10–12% of all cases of clinical mastitis. Aim: The objective was to analyze 237 strains of S. aureus isolated from the milk of cows with subclinical mastitis regarding the spa, mecA, mecC and pvl genes and to perform spa and multi-locus sequence typing (MLST). Methods: Sequencing amplified gene sequences was conducted at Macrogen Europe. Ridom StaphType and BioNumerics software was used to analyze obtained sequences of spa and seven housekeeping genes. Results: The spa fragment was present in 204 (86.1%) of strains, while mecA and mecC gene were detected in 10 strains, and the pvl gene was not detected. Spa typing successfully analyzed 153 tested isolates (64.3%), confirming 53 spa types, four of which were new types. The most frequent spa type was t2678 (14%). MLST typed 198 (83.5%) tested strains and defined 32 different allele profiles, of which three were new. The most frequent allele profile was ST133 (20.7%). Six groups (G) and 15 singletons were defined. Conclusion: Taking the number of confirmed spa types and sequence types (STs) into account, it can be concluded that the strains of S. aureus isolated from the milk of cows with subclinical mastitis form a heterogenous group. To check the possible zoonotic potential of isolates it would be necessary to test the persons and other livestock on the farms.

Download Full-text

NoRCE: non-coding RNA sets cis enrichment tool

BMC Bioinformatics ◽

10.1186/s12859-021-04112-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gulden Olgun ◽

Afshan Nabi ◽

Oznur Tastan

Keyword(s):

Expression Patterns ◽

Target Prediction ◽

Enrichment Analysis ◽

Fruit Fly ◽

Relevant Information ◽

R Package ◽

Data Repository ◽

Biologically Relevant ◽

Gene Sets ◽

Data Files

Abstract Background While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. Results We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. Conclusions NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users’ preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE.

Download Full-text