scholarly journals SNPPar: identifying convergent evolution and other homoplasies from microbial whole-genome alignments

2021 ◽  
Vol 7 (12) ◽  
Author(s):  
David J. Edwards ◽  
Sebastián Duchene ◽  
Bernard Pope ◽  
Kathryn E. Holt

Homoplasic SNPs are considered important signatures of strong (positive) selective pressure, and hence of adaptive evolution for clinically relevant traits such as antibiotic resistance and virulence. Here we present a new tool, SNPPar, for efficient detection and analysis of homoplasic SNPs from large whole genome sequencing datasets (>1000 isolates and/or >100 000 SNPs). SNPPar takes as input an SNP alignment, tree and annotated reference genome, and uses a combination of simple monophyly tests and ancestral state reconstruction (ASR, via TreeTime) to assign mutation events to branches and identify homoplasies. Mutations are annotated at the level of codon and gene, to facilitate analysis of convergent evolution. Testing on simulated data (120 Mycobacterium tuberculosis alignments representing local and global samples) showed SNPPar can detect homoplasic SNPs with very high specificity (zero false-positives in all tests) and high sensitivity (zero false-negatives in 89 % of tests). SNPPar analysis of three empirically sampled datasets ( Elizabethkingia anophelis , Burkholderia dolosa and M. tuberculosis ) produced results that were in concordance with previous studies, in terms of both individual homoplasies and evidence of convergence at the codon and gene levels. SNPPar analysis of a simulated alignment of ~64 000 genome-wide SNPs from 2000 M. tuberculosis genomes took ~23 min and ~2.6 GB of RAM to generate complete annotated results on a laptop. This analysis required ASR be conducted for only 1.25 % of SNPs, and the ASR step took ~23 s and 0.4 GB of RAM. SNPPar automates the detection and annotation of homoplasic SNPs efficiently and accurately from large SNP alignments. As demonstrated by the examples included here, this information can be readily used to explore the role of homoplasy in parallel and/or convergent evolution at the level of nucleotide, codon and/or gene.

2020 ◽  
Vol 6 (11) ◽  
Author(s):  
Eleonora Tassinari ◽  
Matt Bawn ◽  
Gaetan Thilliez ◽  
Oliver Charity ◽  
Luke Acton ◽  
...  

Epidemic and pandemic clones of bacterial pathogens with distinct characteristics continually emerge, replacing those previously dominant through mechanisms that remain poorly characterized. Here, whole-genome-sequencing-powered epidemiology linked horizontal transfer of a virulence gene, sopE, to the emergence and clonal expansion of a new epidemic Salmonella enterica serovar Typhimurium (S. Typhimurium) clone. The sopE gene is sporadically distributed within the genus Salmonella and rare in S . enterica Typhimurium lineages, but was acquired multiple times during clonal expansion of the currently dominant pandemic monophasic S. Typhimurium sequence type (ST) 34 clone. Ancestral state reconstruction and time-scaled phylogenetic analysis indicated that sopE was not present in the common ancestor of the epidemic clade, but later acquisition resulted in increased clonal expansion of sopE-containing clones that was temporally associated with emergence of the epidemic, consistent with increased fitness. The sopE gene was mainly associated with a temperate bacteriophage mTmV, but recombination with other bacteriophage and apparent horizontal gene transfer of the sopE gene cassette resulted in distribution among at least four mobile genetic elements within the monophasic S . enterica Typhimurium ST34 epidemic clade. The mTmV prophage lysogenic transfer to other S. enterica serovars in vitro was limited, but included the common pig-associated S . enterica Derby (S. Derby). This may explain mTmV in S. Derby co-circulating on farms with monophasic S. Typhimurium ST34, highlighting the potential for further transfer of the sopE virulence gene in nature. We conclude that whole-genome epidemiology pinpoints potential drivers of evolutionary and epidemiological dynamics during pathogen emergence, and identifies targets for subsequent research in epidemiology and bacterial pathogenesis.


Author(s):  
Hisami Kobayashi ◽  
Yasuhiro Tanizawa ◽  
Mitsuo Sakamoto ◽  
Moriya Ohkuma ◽  
Masanori Tohno

The taxonomic status of the species Clostridium methoxybenzovorans was assessed. The 16S rRNA gene sequence, whole-genome sequence and phenotypic characterizations suggested that the type strain deposited in the American Type Culture Collection ( C. methoxybenzovorans ATCC 700855T) is a member of the species Eubacterium callanderi . Hence, C. methoxybenzovorans ATCC 700855T cannot be used as a reference for taxonomic study. The type strain deposited in the German Collection of Microorganism and Cell Cultures GmbH (DSM 12182T) is no longer listed in its online catalogue. Also, both the 16S rRNA gene and the whole-genome sequences of the original strain SR3T showed high sequence identity with those of Lacrimispora indolis (recently reclassified from Clostridium indolis ) as the most closely related species. Analysis of the two genomes showed average nucleotide identity based on blast and digital DNA–DNA hybridization values of 98.3 and 87.9 %, respectively. Based on these results, C. methoxybenzovorans SR3T was considered to be a member of L. indolis .


2013 ◽  
Vol 63 (Pt_10) ◽  
pp. 3920-3926 ◽  
Author(s):  
Julia S. Bennett ◽  
Keith A. Jolley ◽  
Martin C. J. Maiden

Phylogenies generated from whole genome sequence (WGS) data provide definitive means of bacterial isolate characterization for typing and taxonomy. The species status of strains recently defined with conventional taxonomic approaches as representing Neisseria oralis was examined by the analysis of sequences derived from WGS data, specifically: (i) 53 Neisseria ribosomal protein subunit (rps) genes (ribosomal multi-locus sequence typing, rMLST); and (ii) 246 Neisseria core genes (core genome MLST, cgMLST). These data were compared with phylogenies derived from 16S and 23S rRNA gene sequences, demonstrating that the N. oralis strains were monophyletic with strains described previously as representing ‘ Neisseria mucosa var. heidelbergensis’ and that this group was of equivalent taxonomic status to other well-described species of the genus Neisseria . Phylogenetic analyses also indicated that Neisseria sicca and Neisseria macacae should be considered the same species as Neisseria mucosa and that Neisseria flavescens should be considered the same species as Neisseria subflava . Analyses using rMLST showed that some strains currently defined as belonging to the genus Neisseria were more closely related to species belonging to other genera within the family; however, whole genome analysis of a more comprehensive selection of strains from within the family Neisseriaceae would be necessary to confirm this. We suggest that strains previously identified as representing ‘ N. mucosa var. heidelbergensis’ and deposited in culture collections should be renamed N. oralis . Finally, one of the strains of N. oralis was able to ferment lactose, due to the presence of β-galactosidase and lactose permease genes, a characteristic previously thought to be unique to Neisseria lactamica , which therefore cannot be thought of as diagnostic for this species; however, the rMLST and cgMLST analyses confirm that N. oralis is most closely related to N. mucosa .


2021 ◽  
Vol 7 (7) ◽  
Author(s):  
Casper Jamin ◽  
Sien De Koster ◽  
Stefanie van Koeveringe ◽  
Dieter De Coninck ◽  
Klaas Mensaert ◽  
...  

Whole-genome sequencing (WGS) is becoming the de facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. However, interoperability for WGS of bacterial outbreaks is poorly understood. We hypothesized that harmonization of WGS for outbreak surveillance is achievable through the use of identical protocols for both data generation and data analysis. A set of 30 bacterial isolates, comprising of various species belonging to the Enterobacteriaceae family and Enterococcus genera, were selected and sequenced using the same protocol on the Illumina MiSeq platform in each individual centre. All generated sequencing data were analysed by one centre using BioNumerics (6.7.3) for (i) genotyping origin of replications and antimicrobial resistance genes, (ii) core-genome multi-locus sequence typing (cgMLST) for Escherichia coli and Klebsiella pneumoniae and whole-genome multi-locus sequencing typing (wgMLST) for all species. Additionally, a split k-mer analysis was performed to determine the number of SNPs between samples. A precision of 99.0% and an accuracy of 99.2% was achieved for genotyping. Based on cgMLST, a discrepant allele was called only in 2/27 and 3/15 comparisons between two genomes, for E. coli and K. pneumoniae, respectively. Based on wgMLST, the number of discrepant alleles ranged from 0 to 7 (average 1.6). For SNPs, this ranged from 0 to 11 SNPs (average 3.4). Furthermore, we demonstrate that using different de novo assemblers to analyse the same dataset introduces up to 150 SNPs, which surpasses most thresholds for bacterial outbreaks. This shows the importance of harmonization of data-processing surveillance of bacterial outbreaks. In summary, multi-centre WGS for bacterial surveillance is achievable, but only if protocols are harmonized.


2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Isabelle Bernaquez ◽  
Christiane Gaudreau ◽  
Pierre A. Pilon ◽  
Sadjia Bekal

Many public health laboratories across the world have implemented whole-genome sequencing (WGS) for the surveillance and outbreak detection of foodborne pathogens. PulseNet-affiliated laboratories have determined that most single-strain foodborne outbreaks are contained within 0–10 multi-locus sequence typing (MLST)-based allele differences and/or core genome single-nucleotide variants (SNVs). In addition to being a food- and travel-associated outbreak pathogen, most Shigella spp. cases occur through continuous person-to-person transmission, predominantly involving men who have sex with men (MSM), leading to long-term and recurrent outbreaks. Continuous transmission patterns coupled to genetic evolution under antibiotic treatment pressure require an assessment of existing WGS-based subtyping methods and interpretation criteria for cluster inclusion/exclusion. An evaluation of 4 WGS-based subtyping methods [SNVPhyl, coreMLST, core genome MLST (cgMLST) and whole-genome MLST (wgMLST)] was performed on 9 foodborne-, travel- and MSM-related retrospective outbreaks from a collection of 91 Shigella flexneri and 232  Shigella sonnei isolates to determine the methods’ epidemiological concordance, discriminatory power, robustness and ability to generate stable interpretation criteria. The discriminatory powers were ranked as follows: coreMLST<SNVPhyl<cgMLST<wgMLST (range: 0.970–1.000). The genetic differences observed for non-MSM-related Shigella spp. outbreaks respect the standard 0–10 allele/SNV guideline; however, mobile genetic element (MGE)-encoded loci caused inflated genetic variation and discrepant phylogenies for prolonged MSM-related S. sonnei outbreaks via wgMLST. The S. sonnei correlation coefficients of wgMLST were also the lowest at 0.680, 0.703 and 0.712 for SNVPhyl, coreMLST and cgMLST, respectively. Plasmid maintenance, mobilization and conjugation-associated genes were found to be the main source of genetic distance inflation in addition to prophage-related genes. Duplicated alleles arising from the repeated nature of IS elements were also responsible for many false cg/wgMLST differences. The coreMLST approach was shown to be the most robust, followed by SNVPhyl and wgMLST for inter-laboratory comparability. Our results highlight the need for validating species-specific subtyping methods based on microbial genome plasticity and outbreak dynamics in addition to the importance of filtering confounding MGEs for cluster detection.


2019 ◽  
Vol 5 (7) ◽  
Author(s):  
Charles H. D. Williamson ◽  
Nathan E. Stone ◽  
Amalee E. Nunnally ◽  
Heidie M. Hornstra ◽  
David M. Wagner ◽  
...  

Clostridioides difficile is a ubiquitous, diarrhoeagenic pathogen often associated with healthcare-acquired infections that can cause a range of symptoms from mild, self-limiting disease to toxic megacolon and death. Since the early 2000s, a large proportion of C. difficile cases have been attributed to the ribotype 027 (RT027) lineage, which is associated with sequence type 1 (ST1) in the C. difficile multilocus sequence typing scheme. The spread of ST1 has been attributed, in part, to resistance to fluoroquinolones used to treat unrelated infections, which creates conditions ideal for C. difficile colonization and proliferation. In this study, we analysed 27 isolates from a healthcare network in northern Arizona, USA, and 1352 publicly available ST1 genomes to place locally sampled isolates into a global context. Whole genome, single nucleotide polymorphism analysis demonstrated that at least six separate introductions of ST1 were observed in healthcare facilities in northern Arizona over an 18-month sampling period. A reconstruction of transmission networks identified potential nosocomial transmission of isolates, which were only identified via whole genome sequence analysis. Antibiotic resistance heterogeneity was observed among ST1 genomes, including variability in resistance profiles among locally sampled ST1 isolates. To investigate why ST1 genomes are so common globally and in northern Arizona, we compared all high-quality C. difficile genomes and identified that ST1 genomes have gained and lost a number of genomic regions compared to all other C. difficile genomes; analyses of other toxigenic C. difficile sequence types demonstrate that this loss may be anomalous and could be related to niche specialization. These results suggest that a combination of antimicrobial resistance and gain and loss of specific genes may explain the prominent association of this sequence type with C. difficile infection cases worldwide. The degree of genetic variability in ST1 suggests that classifying all ST1 genomes into a quinolone-resistant hypervirulent clone category may not be appropriate. Whole genome sequencing of clinical C. difficile isolates provides a high-resolution surveillance strategy for monitoring persistence and transmission of C. difficile and for assessing the performance of infection prevention and control strategies.


2020 ◽  
Vol 6 (7) ◽  
Author(s):  
Bede Constantinides ◽  
Kevin K. Chau ◽  
T. Phuong Quan ◽  
Gillian Rodger ◽  
Monique I. Andersson ◽  
...  

Escherichia coli and Klebsiella spp. are important human pathogens that cause a wide spectrum of clinical disease. In healthcare settings, sinks and other wastewater sites have been shown to be reservoirs of antimicrobial-resistant E. coli and Klebsiella spp., particularly in the context of outbreaks of resistant strains amongst patients. Without focusing exclusively on resistance markers or a clinical outbreak, we demonstrate that many hospital sink drains are abundantly and persistently colonized with diverse populations of E. coli , Klebsiella pneumoniae and Klebsiella oxytoca , including both antimicrobial-resistant and susceptible strains. Using whole-genome sequencing of 439 isolates, we show that environmental bacterial populations are largely structured by ward and sink, with only a handful of lineages, such as E. coli ST635, being widely distributed, suggesting different prevailing ecologies, which may vary as a result of different inputs and selection pressures. Whole-genome sequencing of 46 contemporaneous patient isolates identified one (2 %; 95 % CI 0.05–11 %) E. coli urine infection-associated isolate with high similarity to a prior sink isolate, suggesting that sinks may contribute to up to 10 % of infections caused by these organisms in patients on the ward over the same timeframe. Using metagenomics from 20 sink-timepoints, we show that sinks also harbour many clinically relevant antimicrobial resistance genes including bla CTX-M, bla SHV and mcr, and may act as niches for the exchange and amplification of these genes. Our study reinforces the potential role of sinks in contributing to Enterobacterales infection and antimicrobial resistance in hospital patients, something that could be amenable to intervention. This article contains data hosted by Microreact.


2021 ◽  
Vol 7 (12) ◽  
Author(s):  
Bojan Papić ◽  
Majda Golob ◽  
Irena Zdovc ◽  
Jana Avberšek ◽  
Metka Pislak Ocepek ◽  
...  

The spore-forming bacterium Paenibacillus larvae is the causative agent of American foulbrood (AFB), a devastating disease of honeybees (Apis mellifera). In the present study, we used whole-genome sequencing (WGS) to investigate an extensive outbreak of AFB in northwestern Slovenia in 2019. A total of 59 P . larvae isolates underwent WGS, of which 40 originated from a single beekeeping operation, to assess the diversity of P. larvae within the beekeeping operation, apiary and colony. By applying a case-specific single-linkage threshold of 34 allele differences (AD), whole-genome multilocus sequence typing (wgMLST) identified two outbreak clusters represented by ERIC II-ST11 clones. All isolates from a single beekeeping operation fell within cluster 1 and the median pairwise AD between them was 10 (range=1–22). The median pairwise AD for apiaries of the same beekeeping operation ranged from 8 to 11 (min.=1, max.=22). For colonies of the same apiary and honey samples from these colonies, the median pairwise AD ranged from 8 to 14 (min.=1, max.=20). The maximum within-cluster distance was 33 pairwise AD for cluster 1 and 44 for cluster 2 isolates. The minimum distance between the outbreak-related and non-related isolates was 37 AD, confirming the importance of associated epidemiological data for delineating outbreak clusters. The observed transmission events could be explained by the activities of honeybees and beekeepers. The present study provides insight into the genetic diversity of P. larvae at different levels and thus provides information for future AFB surveillance.


2021 ◽  
Vol 7 (12) ◽  
Author(s):  
Kyrylo Bessonov ◽  
Chad Laing ◽  
James Robertson ◽  
Irene Yong ◽  
Kim Ziebell ◽  
...  

Escherichia coli is a priority foodborne pathogen of public health concern and phenotypic serotyping provides critical information for surveillance and outbreak detection activities. Public health and food safety laboratories are increasingly adopting whole-genome sequencing (WGS) for characterizing pathogens, but it is imperative to maintain serotype designations in order to minimize disruptions to existing public health workflows. Multiple in silico tools have been developed for predicting serotypes from WGS data, including SRST2, SerotypeFinder and EToKi EBEis, but these tools were not designed with the specific requirements of diagnostic laboratories, which include: speciation, input data flexibility (fasta/fastq), quality control information and easily interpretable results. To address these specific requirements, we developed ECTyper (https://github.com/phac-nml/ecoli_serotyping) for performing both speciation within Escherichia and Shigella , and in silico serotype prediction. We compared the serotype prediction performance of each tool on a newly sequenced panel of 185 isolates with confirmed phenotypic serotype information. We found that all tools were highly concordant, with 92–97 % for O-antigens and 98–100 % for H-antigens, and ECTyper having the highest rate of concordance. We extended the benchmarking to a large panel of 6954 publicly available E. coli genomes to assess the performance of the tools on a more diverse dataset. On the public data, there was a considerable drop in concordance, with 75–91 % for O-antigens and 62–90 % for H-antigens, and ECTyper and SerotypeFinder being the most concordant. This study highlights that in silico predictions show high concordance with phenotypic serotyping results, but there are notable differences in tool performance. ECTyper provides highly accurate and sensitive in silico serotype predictions, in addition to speciation, and is designed to be easily incorporated into bioinformatic workflows.


2020 ◽  
Vol 70 (12) ◽  
pp. 6313-6322
Author(s):  
Kathryn A. Bernard ◽  
Alicia Vachon ◽  
Ana Luisa Pacheco ◽  
Tamara Burdz ◽  
Deborah Wiebe ◽  
...  

Twelve isolates recovered from 10 cystic fibrosis/other patient types and a variety of clinical sources, were referred to Canada's National Microbiology Laboratory over 7 years. These were assignable to the genus Pseudoxanthomonas but were unidentifiable to species level. Patients included five males and five females from two geographically separated provinces, ranging in age from 2 months to 84 years. In contrast, most Pseudoxanthomonas species described to date have been derived from water, plants or contaminated soils. By 16S rRNA gene sequencing, the patient strains had ≥99.4 % similarity to each other but only 97.73–98.29 % to their closest relatives, Pseudoxanthomonas spadix or Pseudoxanthomonas helianthi . Bacteria were studied by whole genome sequencing using average nucleotide identity by Blastn, digital DNA–DNA hybridization, average amino acid identity, core genome and single nucleotide variant analyses, MALDI-TOF, biochemical and cellular fatty acid analyses, and by antimicrobial susceptibility testing. Bacterial structures were assessed using scanning and transmission electron microscopy. Strains were strict aerobes, yellowish-pigmented, oxidative, non-motile, Gram-stain-negative bacilli and generally unable to reduce nitrate. Strains were susceptible to most of the antibiotics tested; some resistance was observed towards carbapenems, several cephems and uniformly to nitrofurantoin. The single taxon group observed by 16S rRNA gene sequencing was supported by whole genome sequencing; genomes ranged in size from 4.36 to 4.73 Mb and had an average G+C content of 69.12 mol%. Based on this study we propose the name Pseudoxanthomonas winnipegensis sp. nov. for this cluster. Pseudoxanthomonas spadix DSM 18855T, acquired for this study, was found to be non-motile phenotypically and by electron microscopy; we therefore propose the emendation of Pseudoxanthomonas spadix Young et al. 2007 to document that observation.


Sign in / Sign up

Export Citation Format

Share Document