scholarly journals Identification ofKlebsiellacapsule synthesis loci from whole genome data

2016 ◽  
Author(s):  
Kelly L. Wyres ◽  
Ryan R. Wick ◽  
Claire Gorrie ◽  
Adam Jenney ◽  
Rainer Follador ◽  
...  

AbstractBackgroundKlebsiella pneumoniaeand close relatives are a growing cause of healthcare-associated infections for which increasing rates of multi-drug resistance are a major concern. TheKlebsiellapolysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible, and many isolates are serologically non-typeable. Molecular methods for capsular typing are needed, but existing methods lack sensitivity and specificity and fail to take advantage of the information available in whole-genome sequence data, which is increasingly being generated for surveillance and investigation ofKlebsiella.MethodsWe investigated the diversity of capsule synthesis loci (K loci) among a large, diverse collection of 2503 genome sequences ofK. pneumoniaeand closely related species. We incorporated analyses of both full-length K locus DNA sequences and clustered protein coding sequences to identify, annotate and compare K locus structures, and we propose a novel method for identifying K loci based on full locus information extracted from whole genome sequences.ResultsA total of 134 distinct K loci were identified, including 31 novel types. Comparative analysis of K locus gene content detected 508 unique protein coding gene clusters that appear to reassort via homologous recombination, generating novel K locus types. Extensive nucleotide diversity was detected among thewziandwzcgenes, both within and between K loci, indicating that current typing schemes based on these genes are inadequate. As a solution, we introduceKaptive, a novel software tool that automates the process of identifying K loci from large sets ofKlebsiellagenomes based on full locus information.ConclusionsThis work highlights the extensive diversity ofKlebsiellaK loci and the proteins that they encode. We propose a standardised K locus nomenclature forKlebsiella, present a curated reference database of all known K loci, and introduce a tool for identifying K loci from genome data (https://github.com/katholt/Kaptive). These developments constitute important new resources for theKlebsiellacommunity for use in genomic surveillance and epidemiology.

2021 ◽  
Author(s):  
James Tambong ◽  
Renlin Xu ◽  
Diane Cuppels ◽  
Julie T Chapados ◽  
suzanne Gerdis ◽  
...  

Pseudomonas syringae pv. tomato is the causal agent of bacterial speck disease of field and greenhouse tomato plants. Only one Canadian whole genome sequence of this economically important pathogen is publicly available in NCBI GenBank. Here, we report 33 whole genome sequences of Canadian strains of P. syringae pv. tomato isolated in Ontario, Canada, between 1992 and 2008. The genome sequences exhibited average nucleotide identity values of 98.64-98.72 % with P. syringae pv. tomato ICMP 2844PT and DC3000, validating the taxonomic standing of these Canadian strains. The genome sizes ranged from 6.20-6.39 Mbp with G+C content of 58.6% and comprised 5,889-6,166 protein-coding sequences (CDSs). The strains had pan- and core-genomes of 6808 and 4,993 gene clusters, respectively. Genome mining of the strains for virulence factors identified typical adherence genes, proteins related to antiphagocytosis, secretion system apparatuses and effectors. Also, partial or complete achromobactin biosynthetic cluster and iron transport genes were identified in all the Canadian strains but absent in P. syringae pv. tomato DC3000 or ICMP 2844 (pathotype). These new whole genome data of Canadian strains of P. syringae pv. tomato could be useful resources in understanding the evolution of this pathogen.


2018 ◽  
Vol 3 ◽  
pp. 118
Author(s):  
Anna Smielewska ◽  
Edward Emmott ◽  
Kyriaki Ranellou ◽  
Ashley Popay ◽  
Ian Goodfellow ◽  
...  

Background:Human parainfluenza viruses type 3 (HPIV3) are a prominent cause of respiratory infection with a significant impact in both pediatric and transplant patient cohorts.  Currently there is a paucity of whole genome sequence data that would allow for detailed epidemiological and phylogenetic analysis of circulating strains in the UK. Although it is known that HPIV3 peaks annually in the UK, to date there are no whole genome sequences of HPIV3 UK strains available. Methods:Clinical strains were obtained from HPIV3 positive respiratory patient samples collected between 2011 and 2015.  These were then amplified using an amplicon based method, sequenced on the Illumina platform and assembled using a new robust bioinformatics pipeline. Phylogenetic analysis was carried out in the context of other epidemiological studies and whole genome sequence data currently available with stringent exclusion of significantly culture-adapted strains of HPIV3.Results:In the current paper we have presented twenty full genome sequences of UK circulating strains of HPIV3 and a detailed phylogenetic analysis thereof.  We have analysed the variability along the HPIV3 genome and identified a short hypervariable region in the non-coding segment between the M (matrix) and F (fusion) genes. The epidemiological classifications obtained by using this region and whole genome data were then compared and found to be identical.Conclusions:The majority of HPIV3 strains were observed at different geographical locations and with a wide temporal spread, reflecting the global distribution of HPIV3. Consistent with previous data, a particular subcluster or strain was not identified as specific to the UK, suggesting that a number of genetically diverse strains circulate at any one time. A small hypervariable region in the HPIV3 genome was identified and it was shown that, in the absence of full genome data, this region could be used for epidemiological surveillance of HPIV3.


2018 ◽  
Vol 3 ◽  
pp. 118 ◽  
Author(s):  
Anna Smielewska ◽  
Edward Emmott ◽  
Kyriaki Ranellou ◽  
Ashley Popay ◽  
Ian Goodfellow ◽  
...  

Background:Human parainfluenza viruses type 3 (HPIV3) are a prominent cause of respiratory infection with a significant impact in both pediatric and transplant patient cohorts.  Currently there is a paucity of whole genome sequence data that would allow for detailed epidemiological and phylogenetic analysis of circulating strains in the UK. Although it is known that HPIV3 peaks annually in the UK, to date there are no whole genome sequences of HPIV3 UK strains available. Methods:Clinical strains were obtained from HPIV3 positive respiratory patient samples collected between 2011 and 2015.  These were then amplified using an amplicon based method, sequenced on the Illumina platform and assembled using a new robust bioinformatics pipeline. Phylogenetic analysis was carried out in the context of other epidemiological studies and whole genome sequence data currently available with stringent exclusion of significantly culture-adapted strains of HPIV3.Results:In the current paper we have presented twenty full genome sequences of UK circulating strains of HPIV3 and a detailed phylogenetic analysis thereof.  We have analysed the variability along the HPIV3 genome and identified a short hypervariable region in the non-coding segment between the M (matrix) and F (fusion) genes. The epidemiological classifications obtained by using this region and whole genome data were then compared and found to be identical.Conclusions:The majority of HPIV3 strains were observed at different geographical locations and with a wide temporal spread, reflecting the global distribution of HPIV3. Consistent with previous data, a particular subcluster or strain was not identified as specific to the UK, suggesting that a number of genetically diverse strains circulate at any one time. A small hypervariable region in the HPIV3 genome was identified and it was shown that, in the absence of full genome data, this region could be used for epidemiological surveillance of HPIV3.


Author(s):  
Viola Kurm ◽  
Ilse Houwers ◽  
Claudia E. Coipan ◽  
Peter Bonants ◽  
Cees Waalwijk ◽  
...  

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.


2019 ◽  
Author(s):  
DJ Darwin R. Bandoy ◽  
B Carol Huang ◽  
Bart C. Weimer

AbstractTaxonomic classification is an essential step in the analysis of microbiome data that depends on a reference database of whole genome sequences. Taxonomic classifiers are built on established reference species, such as the Human Microbiome Project database, that is growing rapidly. While constructing a population wide pangenome of the bacterium Hungatella, we discovered that the Human Microbiome Project reference species Hungatella hathewayi (WAL 18680) was significantly different to other members of this genus. Specifically, the reference lacked the core genome as compared to the other members. Further analysis, using average nucleotide identity (ANI) and 16s rRNA comparisons, indicated that WAL18680 was misclassified as Hungatella. The error in classification is being amplified in the taxonomic classifiers and will have a compounding effect as microbiome analyses are done, resulting in inaccurate assignment of community members and will lead to fallacious conclusions and possibly treatment. As automated genome homology assessment expands for microbiome analysis, outbreak detection, and public health reliance on whole genomes increases this issue will likely occur at an increasing rate. These observations highlight the need for developing reference free methods for epidemiological investigation using whole genome sequences and the criticality of accurate reference databases.


2019 ◽  
Vol 8 (42) ◽  
Author(s):  
Gabriela Vuletin Selak ◽  
Marina Raboteg ◽  
Audrey Dubost ◽  
Danis Abrouk ◽  
Katja Žanić ◽  
...  

Here, we present the total genome sequence of Pantoea sp. strain paga, a plant-associated bacterium isolated from knots present on olive trees grown on the Adriatic Coast. The genome size of Pantoea sp. paga is 5.08 Mb, with a G+C content of 54%. The genome contains 4,776 predicted coding DNA sequences (CDSs), including 70 tRNA genes and 1 ribosomal operon. Obtained genome sequence data will provide insight on the physiology, ecology, and evolution of Pantoea spp.


2019 ◽  
Vol 20 (5) ◽  
pp. 1215 ◽  
Author(s):  
Xavier Argemi ◽  
Yves Hansmann ◽  
Kevin Prola ◽  
Gilles Prévost

Coagulase-negative Staphylococci (CoNS) are skin commensal bacteria. Besides their role in maintaining homeostasis, CoNS have emerged as major pathogens in nosocomial settings. Several studies have investigated the molecular basis for this emergence and identified multiple putative virulence factors with regards to Staphylococcus aureus pathogenicity. In the last decade, numerous CoNS whole-genome sequences have been released, leading to the identification of numerous putative virulence factors. Koch’s postulates and the molecular rendition of these postulates, established by Stanley Falkow in 1988, do not explain the microbial pathogenicity of CoNS. However, whole-genome sequence data has shed new light on CoNS pathogenicity. In this review, we analyzed the contribution of genomics in defining CoNS virulence, focusing on the most frequent and pathogenic CoNS species: S. epidermidis, S. haemolyticus, S. saprophyticus, S. capitis, and S. lugdunensis.


2013 ◽  
Vol 63 (Pt_7) ◽  
pp. 2742-2751 ◽  
Author(s):  
Henryk Urbanczyk ◽  
Yoshitoshi Ogura ◽  
Tetsuya Hayashi

Use of inadequate methods for classification of bacteria in the so-called Harveyi clade (family Vibrionaceae, Gammaproteobacteria) has led to incorrect assignment of strains and proliferation of synonymous species. In order to resolve taxonomic ambiguities within the Harveyi clade and to test usefulness of whole genome sequence data for classification of Vibrionaceae, draft genome sequences of 12 strains were determined and analysed. The sequencing included type strains of seven species: Vibrio sagamiensis NBRC 104589T, Vibrio azureus NBRC 104587T, Vibrio harveyi NBRC 15634T, Vibrio rotiferianus LMG 21460T, Vibrio campbellii NBRC 15631T, Vibrio jasicida LMG 25398T, and Vibrio owensii LMG 25443T. Draft genome sequences of strain LMG 25430, previously designated the type strain of [Vibrio communis], and two strains (MWB 21 and 090810c) from the ‘beijerinckii’ lineage were also determined. Whole genomes of two additional strains (ATCC 25919 and 200612B) that previously could not be assigned to any Harveyi clade species were also sequenced. Analysis of the genome sequence data revealed a clear case of synonymy between V. owensii and [V. communis], confirming an earlier proposal to synonymize both species. Both strains from the ‘beijerinckii’ lineage were classified as V. jasicida, while the strains ATCC 25919 and 200612B were classified as V. owensii and V. campbellii, respectively. We also found that two strains, AND4 and Ex25, are closely related to Harveyi clade bacteria, but could not be assigned to any species of the family Vibrionaceae. The use of whole genome sequence data for the taxonomic classification of the Harveyi clade bacteria and other members of the family Vibrionaceae is also discussed.


Author(s):  
Amnon Koren ◽  
Dashiell J Massey ◽  
Alexa N Bracci

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online


Sign in / Sign up

Export Citation Format

Share Document