scholarly journals FastMLST: A multi-core tool for multilocus sequence typing of draft genome assemblies

2020 ◽  
Author(s):  
Enzo Guerrero-Araya ◽  
Marina Muñoz ◽  
César Rodríguez ◽  
Daniel Paredes-Sabja

ABSTRACTMultilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiological and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of allelic sequences obtained for multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiological data in public repositories. However, data processing speed has become problematic due to datasets’ massive size. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach. Compared with mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of current multi-core computers to simultaneously type thousands of genome assemblies in minutes, reducing processing times by at least 4-fold and with more than 99.95% consistency.Availability and ImplementationThe source code, installation instructions and documentation are available at https://github.com/EnzoAndree/FastMLST

2021 ◽  
Vol 15 ◽  
pp. 117793222110592
Author(s):  
Enzo Guerrero-Araya ◽  
Marina Muñoz ◽  
César Rodríguez ◽  
Daniel Paredes-Sabja

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST


2011 ◽  
Vol 57 (12) ◽  
pp. 982-986 ◽  
Author(s):  
Michelle L. Shuel ◽  
Kathleen E. Karlowsky ◽  
Dennis K.S. Law ◽  
Raymond S.W. Tsang

Population biology of Haemophilus influenzae can be studied by multilocus sequence typing (MLST), and isolates are assigned sequence types (STs) based on nucleotide sequence variations in seven housekeeping genes, including fucK. However, the ST cannot be assigned if one of the housekeeping genes is absent or cannot be detected by the current protocol. Occasionally, strains of H. influenzae have been reported to lack the fucK gene. In this study, we examined the prevalence of this mutation among our collection of H. influenzae isolates. Of the 704 isolates studied, including 282 encapsulated and 422 nonencapsulated isolates, nine were not typeable by MLST owing to failure to detect the fucK gene. All nine fucK-negative isolates were nonencapsulated and belonged to various biotypes. DNA sequencing of the fucose operon region confirmed complete deletion of genes in the operon in seven of the nine isolates, while in the remaining two isolates, some of the genes were found intact or in parts. The significance of these findings is discussed.


Author(s):  
Kiran Kirdat ◽  
Bhavesh Tiwarekar ◽  
Vipool Thorat ◽  
Shivaji Sathe ◽  
Yogesh Shouche ◽  
...  

Sugarcane Grassy Shoot (SCGS) disease is known to be related to Rice Yellow Dwarf (RYD) phytoplasmas (16SrXI-B group) which are found predominantly in sugarcane growing areas of the Indian subcontinent and South-East Asia. The 16S rRNA gene sequences of SCGS phytoplasma strains belonging to the 16SrXI-B group share 98.07 % similarity with ‘Ca. Phytoplasma cynodontis’ strain BGWL-C1 followed by 97.65 % similarity with ‘Ca. P. oryzae’ strain RYD-J. Being placed distinctly away from both the phylogenetically related species, the taxonomic identity of SCGS phytoplasma is unclear and confusing. We attempted to resolve the phylogenetic positions of SCGS phytoplasma based on the phylogenetic analysis of 16S rRNA gene (>1500 bp), nine housekeeping genes (>3500 aa), core genome phylogeny (>10 000 aa) and OGRI values. The draft genome sequences of SCGS phytoplasma (strain SCGS) and Bermuda Grass White leaf (BGWL) phytoplasma (strain LW01), closely related to ‘Ca. P. cynodontis’, were obtained. The SCGS genome was comprised of 29 scaffolds corresponding to 505 173 bp while LW01 assembly contained 21 scaffolds corresponding to 483 935 bp with the fold coverages over 330× and completeness over 90 % for both the genomes. The G+C content of SCGS was 19.86 % while that of LW01 was 20.46 %. The orthoANI values for the strain SCGS against strains LW01 was 79.42 %, and dDDH values were 22. Overall analysis reveals that SCGS phytoplasma forms a distant clade in RYD group of phytoplasmas. Based on phylogenetic analyses and OGRI values obtained from the genome sequences, a novel taxon ‘Candidatus Phytoplasma sacchari’ is proposed.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jiorgos Kourelis ◽  
Farnusch Kaschani ◽  
Friederike M. Grosse-Holz ◽  
Felix Homma ◽  
Markus Kaiser ◽  
...  

Abstract Background Nicotiana benthamiana is an important model organism of the Solanaceae (Nightshade) family. Several draft assemblies of the N. benthamiana genome have been generated, but many of the gene-models in these draft assemblies appear incorrect. Results Here we present an improved proteome based on the Niben1.0.1 draft genome assembly guided by gene models from other Nicotiana species. Due to the fragmented nature of the Niben1.0.1 draft genome, many protein-encoding genes are missing or partial. We complement these missing proteins by similarly annotating other draft genome assemblies. This approach overcomes problems caused by mis-annotated exon-intron boundaries and mis-assigned short read transcripts to homeologs in polyploid genomes. With an estimated 98.1% completeness; only 53,411 protein-encoding genes; and improved protein lengths and functional annotations, this new predicted proteome is better in assigning spectra than the preceding proteome annotations. This dataset is more sensitive and accurate in proteomics applications, clarifying the detection by activity-based proteomics of proteins that were previously predicted to be inactive. Phylogenetic analysis of the subtilase family of hydrolases reveal inactivation of likely homeologs, associated with a contraction of the functional genome in this alloploid plant species. Finally, we use this new proteome annotation to characterize the extracellular proteome as compared to a total leaf proteome, which highlights the enrichment of hydrolases in the apoplast. Conclusions This proteome annotation provides the community working with Nicotiana benthamiana with an important new resource for functional proteomics.


Microbiology ◽  
2010 ◽  
Vol 156 (7) ◽  
pp. 2035-2045 ◽  
Author(s):  
Claudia Picozzi ◽  
Gaia Bonacina ◽  
Ileana Vigentini ◽  
Roberto Foschino

Lactobacillus sanfranciscensis is a lactic acid bacterium that characterizes the sourdough environment. The genetic differences of 24 strains isolated in different years from sourdoughs, mostly collected in Italy, were examined and compared by PFGE and multilocus sequence typing (MLST). The MLST scheme, based on the analysis of six housekeeping genes (gdh, gyrA, mapA, nox, pgmA and pta) was developed for this study. PFGE with the restriction enzyme ApaI proved to have higher discriminatory power, since it revealed 22 different pulsotypes, while 19 sequence types were recognized through MLST analysis. Notably, restriction profiles generated from three isolates collected from the same firm but in three consecutive years clustered in a single pulsotype and showed the same sequence type, emphasizing the fact that the main factors affecting the dominance of a strain are correlated with processing conditions and the manufacturing environment rather than the geographical area. All results indicated a limited recombination among genes and the presence of a clonal population in L. sanfranciscensis. The MLST scheme proposed in this work can be considered a useful tool for characterization of isolates and for in-depth examination of the strain diversity and evolution of this species.


2018 ◽  
Vol 6 (7) ◽  
Author(s):  
Abhishek Somani ◽  
Daniel Smith ◽  
Matthew Hegarty ◽  
Narcis Fernandez-Fuentes ◽  
Sreenivas R. Ravella ◽  
...  

ABSTRACT Non- albicans Candida species are growing in prominence in industrial biotechnology due to their ability to utilize hemicellulose. Here, we present the draft genome sequences of an inhibitor-tolerant Candida tropicalis strain (Y6604) and Candida boidinii NCAIM Y01308 T .


Vaccines ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 665
Author(s):  
Andrea Matucci ◽  
Elisabetta Stefani ◽  
Michele Gastaldelli ◽  
Ilenia Rossi ◽  
Gelinda De Grandi ◽  
...  

Mycoplasma gallisepticum (MG) infects many avian species and leads to significant economic losses in the poultry industry. Transmission of this pathogen occurs both horizontally and vertically, and strategies to avoid the spread of MG rely on vaccination and the application of biosecurity measures to maintain breeder groups as pathogen-free. Two live attenuated MG vaccine strains are licensed in Italy: 6/85 and ts-11. After their introduction, the implementation of adequate genotyping tools became necessary to distinguish between field and vaccine strains and to guarantee proper infection monitoring activity. In this study, 40 Italian MG isolates collected between 2010–2019 from both vaccinated and unvaccinated farms were genotyped using gene-targeted sequencing (GTS) of the cythadesin gene mgc2 and multilocus sequence typing (MLST) based on six housekeeping genes. The discriminatory power of GTS typing ensures 6/85-like strain identification, but the technique does not allow the identification ts-11 strains; conversely, MLST differentiates both vaccine strains, describing more detailed interrelation structures. Our study describes MG genetic scenario within a mixed farming context. In conclusion, the use of adequate typing methods is essential to understand the evolutionary dynamics of MG strains in a particular area and to conduct epidemiological investigations in the avian population.


2019 ◽  
Vol 85 (8) ◽  
Author(s):  
Fabian Pilet ◽  
Robert Nketsia Quaicoe ◽  
Isaac Jesuorobo Osagie ◽  
Marcos Freire ◽  
Xavier Foissac

ABSTRACTTo sustain epidemiological studies on coconut lethal yellowing disease (CLYD), a devastating disease in Africa caused by a phytoplasma, we developed a multilocus sequence typing (MLST) scheme for “CandidatusPhytoplasma palmicola” based on eight housekeeping genes. At the continental level, eight different sequence types were identified among 132 “CandidatusPhytoplasma palmicola”-infected coconuts collected in Ghana, Nigeria, and Mozambique, where CLYD epidemics are still very active. “CandidatusPhytoplasma palmicola” appeared to be a bacterium that is subject to strong bottlenecks, reducing the fixation of positively selected beneficial mutations into the bacterial population. This phenomenon, as well as a limited plant host range, might explain the observed country-specific distribution of the eight haplotypes. As an alternative means to increase fitness, bacteria can also undergo genetic exchange; however, no evidence for such recombination events was found for “CandidatusPhytoplasma palmicola.” The implications for CLYD epidemiology and prophylactic control are discussed. The usefulness of seven housekeeping genes to investigate the genetic diversity in the genus “CandidatusPhytoplasma” is underlined.IMPORTANCECoconut is an important crop for both industry and small stakeholders in many intertropical countries. Phytoplasma-associated lethal yellowing-like diseases have become one of the major pests that limit coconut cultivation as they have emerged in different parts of the world. We developed a multilocus sequence typing scheme (MLST) for tracking epidemics of “Ca. Phytoplasma palmicola,” which is responsible for coconut lethal yellowing disease (CLYD) on the African continent. MLST analysis applied to diseased coconut samples collected in western and eastern African countries also showed the existence of three distinct populations of “Ca. Phytoplasma palmicola” with low intrapopulation diversity. The reasons for the observed strong geographic patterns remain to be established but could result from the lethality of CLYD and the dominance of short-distance insect-mediated transmission.


2008 ◽  
Vol 190 (8) ◽  
pp. 2831-2840 ◽  
Author(s):  
Narjol González-Escalona ◽  
Jaime Martinez-Urtaza ◽  
Jaime Romero ◽  
Romilio T. Espejo ◽  
Lee-Ann Jaykus ◽  
...  

ABSTRACT Vibrio parahaemolyticus is an important human pathogen whose transmission is associated with the consumption of contaminated seafood. There is a growing public health concern due to the emergence of a pandemic strain causing severe outbreaks worldwide. Many questions remain unanswered regarding the evolution and population structure of V. parahaemolyticus. In this work, we describe a multilocus sequence typing (MLST) scheme for V. parahaemolyticus based on the internal fragment sequences of seven housekeeping genes. This MLST scheme was applied to 100 V. parahaemolyticus strains isolated from geographically diverse clinical (n = 37) and environmental (n = 63) sources. The sequences obtained from this work were deposited and are available in a public database (http://pubmlst.org/vparahaemolyticus ). Sixty-two unique sequence types were identified, and most (50) were represented by a single isolate, suggesting a high level of genetic diversity. Three major clonal complexes were identified by eBURST analysis. Separate clonal complexes were observed for V. parahaemolyticus isolates originating from the Pacific and Gulf coasts of the United States, while a third clonal complex consisted of strains belonging to the pandemic clonal complex with worldwide distribution. The data reported in this study indicate that V. parahaemolyticus is genetically diverse with a semiclonal population structure and an epidemic structure similar to that of Vibrio cholerae. Genetic diversity in V. parahaemolyticus appears to be driven primarily by frequent recombination rather than mutation, with recombination ratios estimated at 2.5:1 and 8.8:1 by allele and site, respectively. Application of this MLST scheme to more V. parahaemolyticus strains and by different laboratories will facilitate production of a global picture of the epidemiology and evolution of this pathogen.


Sign in / Sign up

Export Citation Format

Share Document