Hash-based core genome multi-locus sequencing typing for Clostridium difficile

AbstractBackgroundPathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely-related genomes among a background of thousands of other genomes is challenging.MethodsWe describe a refinement to core-genome multi-locus sequence typing (cgMLST) where alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralised database of sequentially-numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to mapping-based approaches in Clostridium difficile using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals.ResultsHash-cgMLST provided the same results as standard cgMLST with minimal performance penalty. Comparing 272 pairs of replicate sequences, using reference-based mapping there were 0, 1 or 2 SNPs between 262(96%), 5(2%) and 1(<1%) pairs respectively. Using hash-cgMLST or standard cgMLST, 197(72%) replicate pairs had zero gene differences, 37(14%), 8(3%) and 30(11%) pairs had 1, 2 and >2 differences respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies. Considering 413 pairs of infections within ≤2 SNPS, i.e. consistent with recent transmission, 266(64%) had ≤2 gene differences and 50(12%) ≥5 differences. Comparing a genome to 100,000 others took <1 minute using hash-cgMLST.ConclusionHash-cgMLST is an effective surveillance tool that can rapidly identify clusters of related genomes. However, cgMLST/hash-cgMLST generates potentially more false variants than mapping-based analysis. Refined mapping-based variant calling is likely required to precisely define close genetic relationships.

Download Full-text

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile

Journal of Clinical Microbiology ◽

10.1128/jcm.01037-19 ◽

2019 ◽

Vol 58 (1) ◽

Cited By ~ 1

Author(s):

David W. Eyre ◽

Tim E. A. Peto ◽

Derrick W. Crook ◽

A. Sarah Walker ◽

Mark H. Wilcox

Keyword(s):

Clostridium Difficile ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Genetic Relationships ◽

Nucleotide Polymorphisms ◽

Content Type ◽

Recent Transmission ◽

A Genome ◽

Performance Penalty

ABSTRACT Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.

Download Full-text

Whole Genome Sequencing Refines Knowledge on the Population Structure of Mycobacterium bovis from a Multi-Host Tuberculosis System

Microorganisms ◽

10.3390/microorganisms9081585 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1585

Author(s):

Ana C. Reis ◽

Liliana C. M. Salvador ◽

Suelee Robbe-Austerman ◽

Rogério Tenreiro ◽

Ana Botelho ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Wild Boar ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Red Deer ◽

Variable Number Tandem Repeat ◽

Variant Calling ◽

Whole Genome ◽

Network Analyses

Classical molecular analyses of Mycobacterium bovis based on spoligotyping and Variable Number Tandem Repeat (MIRU-VNTR) brought the first insights into the epidemiology of animal tuberculosis (TB) in Portugal, showing high genotypic diversity of circulating strains that mostly cluster within the European 2 clonal complex. Previous surveillance provided valuable information on the prevalence and spatial occurrence of TB and highlighted prevalent genotypes in areas where livestock and wild ungulates are sympatric. However, links at the wildlife–livestock interfaces were established mainly via classical genotype associations. Here, we apply whole genome sequencing (WGS) to cattle, red deer and wild boar isolates to reconstruct the M. bovis population structure in a multi-host, multi-region disease system and to explore links at a fine genomic scale between M. bovis from wildlife hosts and cattle. Whole genome sequences of 44 representative M. bovis isolates, obtained between 2003 and 2015 from three TB hotspots, were compared through single nucleotide polymorphism (SNP) variant calling analyses. Consistent with previous results combining classical genotyping with Bayesian population admixture modelling, SNP-based phylogenies support the branching of this M. bovis population into five genetic clades, three with apparent geographic specificities, as well as the establishment of an SNP catalogue specific to each clade, which may be explored in the future as phylogenetic markers. The core genome alignment of SNPs was integrated within a spatiotemporal metadata framework to further structure this M. bovis population by host species and TB hotspots, providing a baseline for network analyses in different epidemiological and disease control contexts. WGS of M. bovis isolates from Portugal is reported for the first time in this pilot study, refining the spatiotemporal context of TB at the wildlife–livestock interface and providing further support to the key role of red deer and wild boar on disease maintenance. The SNP diversity observed within this dataset supports the natural circulation of M. bovis for a long time period, as well as multiple introduction events of the pathogen in this Iberian multi-host system.

Download Full-text

Clinical-grade whole-genome sequencing and 3′ transcriptome analysis of colorectal cancer patients

Genome Medicine ◽

10.1186/s13073-021-00852-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Agata Stodolna ◽

Miao He ◽

Mahesh Vasipalli ◽

Zoya Kingsbury ◽

Jennifer Becq ◽

...

Keyword(s):

Colorectal Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Transcriptome Analysis ◽

Variant Calling ◽

Standard Of Care ◽

Genomic Variation ◽

Whole Genome ◽

Clinical Grade ◽

Pathway Gene

Abstract Background Clinical-grade whole-genome sequencing (cWGS) has the potential to become the standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3′ transcriptome analysis would give new insights into colorectal cancer. Methods Patients underwent PCR-free whole-genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into the mutational signatures and tumour biology were gained by the use of 3′ RNA-seq. Results Fifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 30% of patients had a tumour mutational burden of > 10 mutations/Mb of DNA, suggesting suitability for immunotherapy. Conclusions Clinical whole-genome sequencing offers a potential avenue for the identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Comparison of Multilocus Variable-Number Tandem-Repeat Analysis and Whole-Genome Sequencing for Investigation of Clostridium difficile Transmission

Journal of Clinical Microbiology ◽

10.1128/jcm.01095-13 ◽

2013 ◽

Vol 51 (12) ◽

pp. 4141-4149 ◽

Cited By ~ 48

Author(s):

D. W. Eyre ◽

W. N. Fawley ◽

E. L. Best ◽

D. Griffiths ◽

N. E. Stoesser ◽

...

Keyword(s):

Clostridium Difficile ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tandem Repeat ◽

Variable Number Tandem Repeat ◽

Variable Number ◽

Whole Genome ◽

Repeat Analysis

Download Full-text

Transmission of Multidrug/Rifampicin-Resistant Mycobacterium Tuberculosis in Chongqing, China: A Retrospective Observational Study Using Whole-Genome Sequencing

10.21203/rs.3.rs-717466/v1 ◽

2021 ◽

Author(s):

Bing Zhao ◽

Chunfa Liu ◽

Jiale Fan ◽

Aijing Ma ◽

Wencong He ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Resistance Mutations ◽

Treatment Information ◽

Recent Transmission ◽

Drugs Analysis ◽

Lineage 2 ◽

And Cluster Analysis

Abstract Background: Multidrug/rifampicin-resistant tuberculosis (MDR/RR-TB) is a global barrel for ‘Stop TB plan’. China has the second highest MDR/RR-TB burden in whole world wide. Understanding the transmission dynamic is facilitated for disease control. Methods: Whole genome sequencing (WGS) data from patients of Chongqing tuberculosis control institute were used for phylogenetic classifications, resistance predictions, and cluster analysis as indicator for recent transmission (RT). Factors associated with MDR/RR-TB were defined by a logistic regression model. Results: A total of 223 cases of MDR/RR-TB were recorded between Jan 1, 2018 and Dec 31, 2020, and 200 cases obtained relevant treatment information. The patients who are older than 55 year old were more likely to suffering from death. 178 MDR/RR strains were obtained WGS data, 152 were classified as lineage 2 strains. 80 (44.9%, 80 of 178) strains were in 20 genomic clusters that differed by 12 or fewer single nucleotide polymorphisms (SNPs), indicating RT. Patients who were infected with lineage 2 strains is a significant factor driving the epidemic towards MDR/RR-TB. Resistance mutations of first-line tuberculosis drugs analysis found that 79 (98.8%) of all 80 strains defined as RT have same mutations among each clusters totally. 55% (44 of 80) of the MDR/RR-TB strains accumulated additional drug resistance mutations along the transmission chain, especially fluoroquinolones (FQs) (63.6%, 28 of 44). Conclusions: The age is the most significant factor that causes death of MDR/RR-TB patients. RT of MDR/RR strains is not only drove the MDR/RR-TB epidemic, but also accumulated more serious resistance along the transmission chains.

Download Full-text

Genomic analysis of an outbreak of bovine tuberculosis in a man-made multi-host species system: a call for action on wildlife in Brazil

10.22541/au.161958001.11990195/v1 ◽

2021 ◽

Author(s):

Daiane A. R. Lima ◽

Cristina K. Zimpel ◽

José Salvatore Patané ◽

Taiana T. Silva-Pereira ◽

Rodrigo N. Etges ◽

...

Keyword(s):

Environmental Factors ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Bovine Tuberculosis ◽

Genomic Analysis ◽

Whole Genome ◽

System A ◽

Recent Transmission ◽

And Control ◽

Tuberculin Tests

We report on a 15-year-long outbreak of bovine tuberculosis (bTB) in wildlife from a Brazilian safari park. A timeline of diagnostic events and whole-genome sequencing (WGS) of 21 Mycobacterium bovis isolates from deer and llamas were analyzed. Accordingly, from 2003 to 2018, at least 16 animals, from 8 species, died due to TB, which is likely an underestimated number. In three occasions since 2013, the deer presented positive tuberculin tests, leading to the park closure and culling of all deer. WGS indicated that multiple M. bovis strains were circulating, with at least three founding introductions since the park inauguration in 1977. Recent transmission events between nearby farms and the park were not found based on WGS. Lastly, by discussing socio-economic and environmental factors escaping current regulatory gaps that were determinant of this outbreak, we pledge for the development of a plan to report and control bTB in wildlife in Brazil.

Download Full-text

Fine Mapping Using Whole-Genome Sequencing Confirms Anti-Müllerian Hormone as a Major Gene for Sex Determination in Farmed Nile Tilapia (Oreochromis niloticus L.)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400297 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3213-3223 ◽

Cited By ~ 8

Author(s):

Giovanna Cáceres ◽

María E. López ◽

María I. Cádiz ◽

Grazyella M. Yoshida ◽

Ana Jedlicki ◽

...

Keyword(s):

Sex Determination ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Oreochromis Niloticus ◽

Nile Tilapia ◽

Major Gene ◽

Whole Genome ◽

Important Species ◽

Genome Wide ◽

A Genome

Nile tilapia (Oreochromis niloticus) is one of the most cultivated and economically important species in world aquaculture. Intensive production promotes the use of monosex animals, due to an important dimorphism that favors male growth. Currently, the main mechanism to obtain all-male populations is the use of hormones in feeding during larval and fry phases. Identifying genomic regions associated with sex determination in Nile tilapia is a research topic of great interest. The objective of this study was to identify genomic variants associated with sex determination in three commercial populations of Nile tilapia. Whole-genome sequencing of 326 individuals was performed, and a total of 2.4 million high-quality bi-allelic single nucleotide polymorphisms (SNPs) were identified after quality control. A genome-wide association study (GWAS) was conducted to identify markers associated with the binary sex trait (males = 1; females = 0). A mixed logistic regression GWAS model was fitted and a genome-wide significant signal comprising 36 SNPs, spanning a genomic region of 536 kb in chromosome 23 was identified. Ten out of these 36 genetic variants intercept the anti-Müllerian (Amh) hormone gene. Other significant SNPs were located in the neighboring Amh gene region. This gene has been strongly associated with sex determination in several vertebrate species, playing an essential role in the differentiation of male and female reproductive tissue in early stages of development. This finding provides useful information to better understand the genetic mechanisms underlying sex determination in Nile tilapia.

Download Full-text

Comparison of routine field epidemiology and whole genome sequencing to identify tuberculosis transmission in a remote setting

Epidemiology and Infection ◽

10.1017/s0950268820000072 ◽

2020 ◽

Vol 148 ◽

Cited By ~ 1

Author(s):

J. L. Guthrie ◽

L. Strudwick ◽

B. Roberts ◽

M. Allen ◽

J. McFadzen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tandem Repeats ◽

Variable Number ◽

Yukon Territory ◽

Contact Tracing ◽

Whole Genome ◽

Attitudes And Practices ◽

Genomic Epidemiology ◽

Recent Transmission

Abstract Yukon Territory (YT) is a remote region in northern Canada with ongoing spread of tuberculosis (TB). To explore the utility of whole genome sequencing (WGS) for TB surveillance and monitoring in a setting with detailed contact tracing and interview data, we used a mixed-methods approach. Our analysis included all culture-confirmed cases in YT (2005–2014) and incorporated data from 24-locus Mycobacterial Interspersed Repetitive Units-Variable Number of Tandem Repeats (MIRU-VNTR) genotyping, WGS and contact tracing. We compared field-based (contact investigation (CI) data + MIRU-VNTR) and genomic-based (WGS + MIRU-VNTR + basic case data) investigations to identify the most likely source of each person's TB and assessed the knowledge, attitudes and practices of programme personnel around genotyping and genomics using online, multiple-choice surveys (n = 4) and an in-person group interview (n = 5). Field- and genomics-based approaches agreed for 26 of 32 (81%) cases on likely location of TB acquisition. There was less agreement in the identification of specific source cases (13/22 or 59% of cases). Single-locus MIRU-VNTR variants and limited genetic diversity complicated the analysis. Qualitative data indicated that participants viewed genomic epidemiology as a useful tool to streamline investigations, particularly in differentiating latent TB reactivation from the recent transmission. Based on this, genomic data could be used to enhance CIs, focus resources, target interventions and aid in TB programme evaluation.

Download Full-text

Independent Microevolution Mediated by Mobile Genetic Elements of IndividualClostridium difficileIsolates from Clade 4 Revealed by Whole-Genome Sequencing

mSystems ◽

10.1128/msystems.00252-18 ◽

2019 ◽

Vol 4 (2) ◽

Cited By ~ 4

Author(s):

Yuan Wu ◽

Chen Liu ◽

Wen-Ge Li ◽

Jun-Li Xu ◽

Wen-Zhu Zhang ◽

...

Keyword(s):

Drug Resistance ◽

Clostridium Difficile ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mobile Genetic Elements ◽

High Rate ◽

Whole Genome ◽

Content Type ◽

Genetic Elements ◽

Sequence Types

ABSTRACTHorizontal gene transfer of mobile genetic elements (MGEs) accounts for the mosaic genome ofClostridium difficile, leading to acquisition of new phenotypes, including drug resistance and reconstruction of the genomes. MGEs were analyzed according to the whole-genome sequences of 37C. difficileisolates with a variety of sequence types (STs) within clade 4 from China. Great diversity was found in each transposon even within isolates with the same ST. Two novel transposons were identified in isolates ZR9 and ZR18, of which approximately one third to half of the genes showed heterogenous origins compared with the usual intestinal bacterial genes. Most importantly,catD, known to be harbored by Tn4453a/b, was replaced byaac(6′) aph(2′′)in isolates 2, 7, and 28. This phenomenon illustrated the frequent occurrence of gene exchanges betweenC. difficileand other enterobacteria with individual heterogeneity. Numerous prophages and CRISPR arrays were identified inC. difficileisolates of clade 4. Approximately 20% of spacers were located in prophage-carried CRISPR arrays, providing a new method for typing and tracing the origins of closely related isolates, as well as in-depth studies of the mechanism underlying genome remodeling. The rates of drug resistance were obviously higher than those reported previously around the world, although all isolates retained high sensitivity to vancomycin and metronidazole. The increasing number ofC. difficileisolates resistant to all antibiotics tested here suggests the ease with which resistance is acquiredin vivo. This study gives insights into the genetic mechanism of microevolution within clade 4.IMPORTANCEMobile genetic elements play a key role in the continuing evolution ofClostridium difficile, resulting in the emergence of new phenotypes for individual isolates. On the basis of whole-genome sequencing analysis, we comprehensively explored transposons, CRISPR, prophage, and genetic sites for drug resistance within clade 4C. difficileisolates with different sequence types. Great diversity in MGEs and a high rate of multidrug resistance were found within this clade, including new transposons, Tn4453a/bwithaac(6′) aph(2′′)instead ofcatD, and a relatively high rate of prophage-carried CRISPR arrays. These findings provide important new insights into the mechanism of genome remodeling within clade 4 and offer a new method for typing and tracing the origins of closely related isolates.

Download Full-text