Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks

Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.

Download Full-text

Whole-Genome Sequencing for Investigating a Health Care-Associated Outbreak of Carbapenem-Resistant Acinetobacter baumannii

Diagnostics ◽

10.3390/diagnostics11020201 ◽

2021 ◽

Vol 11 (2) ◽

pp. 201

Author(s):

Sang Mee Hwang ◽

Hee Won Cho ◽

Tae Yeul Kim ◽

Jeong Su Park ◽

Jongtak Jung ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Acinetobacter Baumannii ◽

Genome Sequencing ◽

Snp Analysis ◽

Whole Genome ◽

Phylogenetic Tree Analysis ◽

Web Based ◽

Hospital Outbreak ◽

Carbapenem Resistant ◽

Bioinformatics Tools

Carbapenem-resistant Acinetobacter baumannii (CRAB) outbreaks in hospital settings challenge the treatment of patients and infection control. Understanding the relatedness of clinical isolates is important in distinguishing outbreak isolates from sporadic cases. This study investigated 11 CRAB isolates from a hospital outbreak by whole-genome sequencing (WGS), utilizing various bioinformatics tools for outbreak analysis. The results of multilocus sequence typing (MLST), single nucleotide polymorphism (SNP) analysis, and phylogenetic tree analysis by WGS through web-based tools were compared, and repetitive element polymerase chain reaction (rep-PCR) typing was performed. Through the WGS of 11 A. baumannii isolates, three clonal lineages were identified from the outbreak. The coexistence of blaOXA-23, blaOXA-66, blaADC-25, and armA with additional aminoglycoside-inactivating enzymes, predicted to confer multidrug resistance, was identified in all isolates. The MLST Oxford scheme identified three types (ST191, ST369, and ST451), and, through whole-genome MLST and whole-genome SNP analyses, different clones were found to exist within the MLST types. wgSNP showed the highest discriminatory power with the lowest similarities among the isolates. Using the various bioinformatics tools for WGS, CRAB outbreak analysis was applicable and identified three discrete clusters differentiating the separate epidemiologic relationships among the isolates.

Download Full-text

A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome

Experimental & Molecular Medicine ◽

10.1038/s12276-021-00586-y ◽

2021 ◽

Author(s):

Seyoung Mun ◽

Songmi Kim ◽

Wooseok Lee ◽

Keunsoo Kang ◽

Thomas J. Meyer ◽

...

Keyword(s):

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Personal Genome ◽

Human Populations ◽

Whole Genome ◽

Structural Variations ◽

Insert Size ◽

Human Genomes ◽

Next Generation Sequencing Ngs

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.

Download Full-text

E. coli NF73-1 Isolated From NASH Patients Aggravates NAFLD in Mice by Translocating Into the Liver and Stimulating M1 Polarization

Frontiers in Cellular and Infection Microbiology ◽

10.3389/fcimb.2020.535940 ◽

2020 ◽

Vol 10 ◽

Author(s):

Yifan Zhang ◽

Weiwei Jiang ◽

Jun Xu ◽

Na Wu ◽

Yang Wang ◽

...

Keyword(s):

Gene Expression ◽

Comparative Genomics ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Normal Diet ◽

Specific Gene ◽

Whole Genome ◽

Metabolic Switch ◽

M1 Macrophages ◽

E Coli

ObjectiveThe gut microbiota is associated with nonalcoholic fatty liver disease (NAFLD). We isolated the Escherichia coli strain NF73-1 from the intestines of a NASH patient and then investigated its effect and underlying mechanism.Methods16S ribosomal RNA (16S rRNA) amplicon sequencing was used to detect bacterial profiles in healthy controls, NAFLD patients and NASH patients. Highly enriched E. coli strains were cultured and isolated from NASH patients. Whole-genome sequencing and comparative genomics were performed to investigate gene expression. Depending on the diet, male C57BL/6J mice were further grouped in normal diet (ND) and high-fat diet (HFD) groups. To avoid disturbing the bacterial microbiota, some of the ND and HFD mice were grouped as “bacteria-depleted” mice and treated with a cocktail of broad-spectrum antibiotic complex (ABX) from the 8th to 10th week. Then, E. coli NF73-1, the bacterial strain isolated from NASH patients, was administered transgastrically for 6 weeks to investigate its effect and mechanism in the pathogenic progression of NAFLD.ResultsThe relative abundance of Escherichia increased significantly in the mucosa of NAFLD patients, especially NASH patients. The results from whole-genome sequencing and comparative genomics showed a specific gene expression profile in E. coli strain NF73-1, which was isolated from the intestinal mucosa of NASH patients. E. coli NF73-1 accelerates NAFLD independently. Only in the HFD-NF73-1 and HFD-ABX-NF73-1 groups were EGFP-labeled E. coli NF73-1 detected in the liver and intestine. Subsequently, translocation of E. coli NF73-1 into the liver led to an increase in hepatic M1 macrophages via the TLR2/NLRP3 pathway. Hepatic M1 macrophages induced by E. coli NF73-1 activated mTOR-S6K1-SREBP-1/PPAR-α signaling, causing a metabolic switch from triglyceride oxidation toward triglyceride synthesis in NAFLD mice.ConclusionsE. coli NF73-1 is a critical trigger in the progression of NAFLD. E. coli NF73-1 might be a specific strain for NAFLD patients.

Download Full-text

Comparison of whole genome sequencing to restriction endonuclease analysis and gel diffusion precipitin-based serotyping of Pasteurella multocida

Journal of Veterinary Diagnostic Investigation ◽

10.1177/1040638717732371 ◽

2017 ◽

Vol 30 (1) ◽

pp. 42-55 ◽

Cited By ~ 2

Author(s):

Karen J. LeCount ◽

Linda K. Schlater ◽

Tod Stuber ◽

Suelee Robbe Austerman ◽

Timothy S. Frana ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Restriction Endonuclease ◽

Genome Sequencing ◽

Pasteurella Multocida ◽

Restriction Endonuclease Analysis ◽

The United States ◽

Snp Analysis ◽

Whole Genome ◽

Gel Diffusion ◽

Endonuclease Analysis

The gel diffusion precipitin test (GDPT) and restriction endonuclease analysis (REA) have commonly been used in the serotyping and genotyping of Pasteurella multocida. Whole genome sequencing (WGS) and single nucleotide polymorphism (SNP) analysis has become the gold standard for other organisms, offering higher resolution than previously available methods. We compared WGS to REA and GDPT on 163 isolates of P. multocida to determine if WGS produced more precise results. The isolates used represented the 16 reference serovars, isolates with REA profiles matching an attenuated fowl cholera vaccine strain, and isolates from 10 different animal species. Isolates originated from across the United States and from Chile. Identical REA profiles clustered together in the phylogenetic tree. REA profiles that differed by only a few bands had fewer SNP differences than REA profiles with more differences, as expected. The GDPT results were diverse but it was common to see a single serovar show up repeatedly within clusters. Several errors were found when examining the REA profiles. WGS was able to confirm these errors and compensate for the subjectivity in analysis of REA. Also, results of WGS and SNP analysis correlated more closely with the epidemiologic data than GDPT. In silico results were also compared to a lipopolysaccharide rapid multiplex PCR test. From the data produced in our study, WGS and SNP analysis was superior to REA and GDPT and highlighted some of the issues with the older tests.

Download Full-text

Genomic Investigation into the Virulome, Pathogenicity, Stress Response Factors, Clonal Lineages, and Phylogenetic Relationship of Escherichia coli Strains Isolated from Meat Sources in Ghana

Genes ◽

10.3390/genes11121504 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1504

Author(s):

Frederick Adzitey ◽

Jonathan Asante ◽

Hezekiel M. Kumalo ◽

Rene B. Khan ◽

Anou M. Somboro ◽

...

Keyword(s):

Escherichia Coli ◽

Stress Response ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Response Factors ◽

Pathogenic Potential ◽

E Coli ◽

Relationship Of ◽

Sequence Types

Escherichia coli are among the most common foodborne pathogens associated with infections reported from meat sources. This study investigated the virulome, pathogenicity, stress response factors, clonal lineages, and the phylogenomic relationship of E. coli isolated from different meat sources in Ghana using whole-genome sequencing. Isolates were screened from five meat sources (beef, chevon, guinea fowl, local chicken, and mutton) and five areas (Aboabo, Central market, Nyorni, Victory cinema, and Tishegu) based in the Tamale Metropolis, Ghana. Following microbial identification, the E. coli strains were subjected to whole-genome sequencing. Comparative visualisation analyses showed different DNA synteny of the strains. The isolates consisted of diverse sequence types (STs) with the most common being ST155 (n = 3/14). Based Upon Related Sequence Types (eBURST) analyses of the study sequence types identified four similar clones, five single-locus variants, and two satellite clones (more distantly) with global curated E. coli STs. All the isolates possessed at least one restriction-modification (R-M) and CRISPR defence system. Further analysis revealed conserved stress response mechanisms (detoxification, osmotic, oxidative, and periplasmic stress) in the strains. Estimation of pathogenicity predicted a higher average probability score (Pscore ≈ 0.937), supporting their pathogenic potential to humans. Diverse virulence genes that were clonal-specific were identified. Phylogenomic tree analyses coupled with metadata insights depicted the high genetic diversity of the E. coli isolates with no correlation with their meat sources and areas. The findings of this bioinformatic analyses further our understanding of E. coli in meat sources and are broadly relevant to the design of contamination control strategies in meat retail settings in Ghana.

Download Full-text

Comparative Genome Analysis of Extended-Spectrum-β-Lactamase-Producing Escherichia coli Sequence Type 131 Strains from Nepal and Japan

mSphere ◽

10.1128/msphere.00289-16 ◽

2016 ◽

Vol 1 (5) ◽

Cited By ~ 6

Author(s):

Tohru Miyoshi-Akiyama ◽

Jatan Bahadur Sherchan ◽

Yohei Doi ◽

Maki Nagamatsu ◽

Jeevan B. Sherchand ◽

...

Keyword(s):

Molecular Epidemiology ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Sequence Type ◽

Whole Genome ◽

Asian Countries ◽

Content Type ◽

E Coli ◽

Global Spread ◽

Low Prevalence

ABSTRACT The global spread of ESBL-E. coli has been driven in large part by pandemic sequence type 131 (ST131). A recent study suggested that, within E. coli ST131, certain sublineages have disseminated worldwide with little association with their geographical origin, highlighting the complexity of the epidemiology of this pandemic clone. ST131 bacteria have also been classified into four virotypes based on the distribution of certain virulence genes. Information on virotype distribution in Asian ST131 strains is limited. We conducted whole-genome sequencing of ESBL-E. coli ST131 strains collected in Nepal and Japan, two Asian countries with a high and low prevalence of ESBL-E. coli, respectively. We systematically compared these ST131 genomes with those reported from other regions to gain insights into the molecular epidemiology of their spread and found the distinct phylogenetic characteristics of the spread of ESBL-E. coli ST131 in these two geographical areas of Asia. The global spread of extended-spectrum-β-lactamase (ESBL)-producing Escherichia coli (ESBL-E. coli) has largely been driven by the pandemic sequence type 131 (ST131). This study aimed to determine the molecular epidemiology of their spread in two Asian countries with contrasting prevalence. We conducted whole-genome sequencing (WGS) of ESBL-E. coli ST131 strains collected prospectively from Nepal and Japan, two countries in Asia with a high and low prevalence of ESBL-E. coli, respectively. We also systematically compared these genomes with those reported from other regions using publicly available WGS data for E. coli ST131 strains. Further, we conducted phylogenetic analysis of these isolates and all genome sequence data for ST131 strains to determine sequence diversity. One hundred five unique ESBL-E. coli isolates from Nepal (February 2013 to July 2013) and 76 isolates from Japan (October 2013 to September 2014) were included. Of these isolates, 54 (51%) isolates from Nepal and 11 (14%) isolates from Japan were identified as ST131 by WGS. Phylogenetic analysis based on WGS suggested that the majority of ESBL-E. coli ST131 isolates from Nepal clustered together, whereas those from Japan were more diverse. Half of the ESBL-E. coli ST131 isolates from Japan belonged to virotype C, whereas half of the isolates from Nepal belonged to a virotype other than virotype A, B, C, D, or E (A/B/C/D/E). The dominant sublineage of E. coli ST131 was H30Rx, which was most prominent in ESBL-E. coli ST131 isolates from Nepal. Our results revealed distinct phylogenetic characteristics of ESBL-E. coli ST131 spread in the two geographical areas of Asia, indicating the involvement of multiple factors in its local spread in each region. IMPORTANCE The global spread of ESBL-E. coli has been driven in large part by pandemic sequence type 131 (ST131). A recent study suggested that, within E. coli ST131, certain sublineages have disseminated worldwide with little association with their geographical origin, highlighting the complexity of the epidemiology of this pandemic clone. ST131 bacteria have also been classified into four virotypes based on the distribution of certain virulence genes. Information on virotype distribution in Asian ST131 strains is limited. We conducted whole-genome sequencing of ESBL-E. coli ST131 strains collected in Nepal and Japan, two Asian countries with a high and low prevalence of ESBL-E. coli, respectively. We systematically compared these ST131 genomes with those reported from other regions to gain insights into the molecular epidemiology of their spread and found the distinct phylogenetic characteristics of the spread of ESBL-E. coli ST131 in these two geographical areas of Asia.

Download Full-text

Whole-genome sequencing of 182 Bursaphelenchus xylophilus strains generates first long read based de novo genome assembly and reveals temperature associated population structure

10.22541/au.159352211.19983305 ◽

2020 ◽

Author(s):

Xiaolei Ding ◽

Yunfei Guo ◽

Jianren Ye ◽

Xiaoqin Wu ◽

Sixi Lin ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Bursaphelenchus Xylophilus ◽

Whole Genome ◽

De Novo Genome Assembly ◽

Long Read

Download Full-text

807. Same-day Transmission Analysis of Nosocomial Transmission Using Nanopore Whole Genome Sequencing

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab466.1003 ◽

2021 ◽

Vol 8 (Supplement_1) ◽

pp. S497-S498

Author(s):

Mohamad Sater ◽

Remy Schwab ◽

Ian Herriott ◽

Tim Farrell ◽

Miriam Huntley

Keyword(s):

High Resolution ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Cost Effective ◽

Error Rates ◽

Sequencing Error ◽

Snp Analysis ◽

Whole Genome ◽

Snp Calling

Abstract Background Healthcare associated infections (HAIs) are a major contributor to patient morbidity and mortality worldwide. HAIs are increasingly important due to the rise of multidrug resistant pathogens which can lead to deadly nosocomial outbreaks. Current methods for investigating transmissions are slow, costly, or have poor detection resolution. A rapid, cost-effective and high-resolution method to identify transmission events is imperative to guide infection control. Whole genome sequencing of infecting pathogens paired with a single nucleotide polymorphism (SNP) analysis can provide high-resolution clonality determination, yet these methods typically have long turnaround times. Here we examined the utility of the Oxford Nanopore Technologies (ONT) platform, a rapid sequencing technology, for whole genome sequencing based transmission analysis. Methods We developed a SNP calling pipeline customized for ONT data, which exhibit higher sequencing error rates and can therefore be challenging for transmission analysis. The pipeline leverages the latest basecalling tools as well as a suite of custom variant calling and filtering algorithms to achieve highest accuracy in clonality calls compared to Illumina-based sequencing. We also capitalize on ONT long reads by assembling outbreak-specific genomes in order to overcome the need for an external reference genome. Results We examined 20 bacterial isolates from 5 HAI investigations previously performed at Day Zero Diagnostics as part of epiXact®, our commercialized Illumina-based HAI sequencing and analysis service. Using the ONT data and pipeline, we achieved greater than 90% SNP-calling sensitivity and precision, allowing 100% accuracy of clonality classification compared to Illumina-based results across common HAI species. We demonstrate the validity and increased resolution of our SNP analysis pipeline using assembled genomes from each outbreak. We also demonstrate that this ONT-based workflow can produce isolate to transmission determination (i.e. including WGS and analysis) in less than 24 hours. SNP calling performance ONT-based SNP calling sensitivity and precision compared to Illumina-based pipeline Conclusion We demonstrate the utility of ONT for HAI investigation, establishing the potential to transform healthcare epidemiology with same-day high-resolution transmission determination. Disclosures Mohamad Sater, PhD, Day Zero Diagnostics (Employee, Shareholder) Remy Schwab, MSc, Day Zero Diagnostics (Employee, Shareholder) Ian Herriott, BS, Day Zero Diagnostics (Employee, Shareholder) Tim Farrell, MS, Day Zero Diagnostics, Inc. (Employee, Shareholder) Miriam Huntley, PhD, Day Zero Diagnostics (Employee, Shareholder)

Download Full-text

Genomic surveillance of Escherichia coli and Klebsiella spp. in hospital sink drains and patients

Microbial Genomics ◽

10.1099/mgen.0.000391 ◽

2020 ◽

Vol 6 (7) ◽

Author(s):

Bede Constantinides ◽

Kevin K. Chau ◽

T. Phuong Quan ◽

Gillian Rodger ◽

Monique I. Andersson ◽

...

Keyword(s):

Escherichia Coli ◽

Antimicrobial Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Type Species ◽

Whole Genome ◽

Content Type ◽

E Coli ◽

Link Type ◽

Antimicrobial Resistance Genes

Escherichia coli and Klebsiella spp. are important human pathogens that cause a wide spectrum of clinical disease. In healthcare settings, sinks and other wastewater sites have been shown to be reservoirs of antimicrobial-resistant E. coli and Klebsiella spp., particularly in the context of outbreaks of resistant strains amongst patients. Without focusing exclusively on resistance markers or a clinical outbreak, we demonstrate that many hospital sink drains are abundantly and persistently colonized with diverse populations of E. coli , Klebsiella pneumoniae and Klebsiella oxytoca , including both antimicrobial-resistant and susceptible strains. Using whole-genome sequencing of 439 isolates, we show that environmental bacterial populations are largely structured by ward and sink, with only a handful of lineages, such as E. coli ST635, being widely distributed, suggesting different prevailing ecologies, which may vary as a result of different inputs and selection pressures. Whole-genome sequencing of 46 contemporaneous patient isolates identified one (2 %; 95 % CI 0.05–11 %) E. coli urine infection-associated isolate with high similarity to a prior sink isolate, suggesting that sinks may contribute to up to 10 % of infections caused by these organisms in patients on the ward over the same timeframe. Using metagenomics from 20 sink-timepoints, we show that sinks also harbour many clinically relevant antimicrobial resistance genes including bla CTX-M, bla SHV and mcr, and may act as niches for the exchange and amplification of these genes. Our study reinforces the potential role of sinks in contributing to Enterobacterales infection and antimicrobial resistance in hospital patients, something that could be amenable to intervention. This article contains data hosted by Microreact.

Download Full-text

Epidemic Clostridioides difficile Ribotype 027 Lineages: Comparisons of Texas Versus Worldwide Strains

Open Forum Infectious Diseases ◽

10.1093/ofid/ofz013 ◽

2019 ◽

Vol 6 (2) ◽

Cited By ~ 7

Author(s):

Bradley T Endres ◽

Khurshida Begum ◽

Hua Sun ◽

Seth T Walk ◽

Ali Memariani ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Phylogenetic Trees ◽

Large Scale ◽

The United States ◽

Whole Genome Sequence ◽

Snp Analysis ◽

Whole Genome ◽

Ribotype 027 ◽

Clostridioides Difficile

Abstract Background The epidemic Clostridioides difficile ribotype 027 strain resulted from the dissemination of 2 separate fluoroquinolone-resistant lineages: FQR1 and FQR2. Both lineages were reported to originate in North America; however, confirmatory large-scale investigations of C difficile ribotype 027 epidemiology using whole genome sequencing has not been undertaken in the United States. Methods Whole genome sequencing and single-nucleotide polymorphism (SNP) analysis was performed on 76 clinical ribotype 027 isolates obtained from hospitalized patients in Texas with C difficile infection and compared with 32 previously sequenced worldwide strains. Maximum-likelihood phylogeny based on a set of core genome SNPs was used to construct phylogenetic trees investigating strain macro- and microevolution. Bayesian phylogenetic and phylogeographic analyses were used to incorporate temporal and geographic variables with the SNP strain analysis. Results Whole genome sequence analysis identified 2841 SNPs including 900 nonsynonymous mutations, 1404 synonymous substitutions, and 537 intergenic changes. Phylogenetic analysis separated the strains into 2 prominent groups, which grossly differed by 28 SNPs: the FQR1 and FQR2 lineages. Five isolates were identified as pre-epidemic strains. Phylogeny demonstrated unique clustering and resistance genes in Texas strains indicating that spatiotemporal bias has defined the microevolution of ribotype 027 genetics. Conclusions Clostridioides difficile ribotype 027 lineages emerged earlier than previously reported, coinciding with increased use of fluoroquinolones. Both FQR1 and FQR2 ribotype 027 epidemic lineages are present in Texas, but they have evolved geographically to represent region-specific public health threats.

Download Full-text