Comparing transmission reconstruction models with Mycobacterium tuberculosis whole genome sequence data

Pathogen genomic epidemiology is now routinely used worldwide to interrogate infectious disease dynamics. Multiple computational tools that reconstruct transmission networks by coupling genomic data with epidemiological modelling have been developed. The resulting inferences are often used to inform outbreak investigations, yet to date, the performance of these transmission reconstruction tools has not been compared specifically for tuberculosis, a disease process with complex epidemiology that includes variable latency periods and within-host heterogeneity. Here, we carried out a systematic comparison of seven publicly available transmission reconstruction tools, evaluating their accuracy in predicting transmission events in both simulated and real-world Mycobacterium tuberculosis outbreaks. No tool was able to fully resolve transmission networks, though both the single-tree and multi-tree input implementations of TransPhylo identified the most epidemiologically supported transmission events and the fewest false positive links. We observed a high degree of variability in the transmission networks inferred by each approach. Our findings may inform the choice of tools in future tuberculosis transmission analyses and underscore the need for caution when interpreting transmission networks produced using probabilistic approaches.

Download Full-text

Epidemiological information is key when interpreting whole genome sequence data – lessons learned from a large Legionella pneumophila outbreak in Warstein, Germany, 2013

Eurosurveillance ◽

10.2807/1560-7917.es.2017.22.45.17-00137 ◽

2017 ◽

Vol 22 (45) ◽

Cited By ~ 8

Author(s):

Markus Petzold ◽

Karola Prior ◽

Jacob Moran-Gilad ◽

Dag Harmsen ◽

Christian Lück

Keyword(s):

Legionella Pneumophila ◽

Sequence Data ◽

Lessons Learned ◽

Whole Genome Sequence ◽

Whole Genome ◽

Typing Method ◽

Epidemic Clone ◽

Genomic Epidemiology ◽

Outbreak Investigations ◽

Epidemiological Information

Introduction Whole genome sequencing (WGS) is increasingly used in Legionnaires’ disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila. Methods: We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results: Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion: The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak.

Download Full-text

Whole genome sequence data of Mycobacterium tuberculosis XDR strain, isolated from patient in Kazakhstan

Data in Brief ◽

10.1016/j.dib.2020.106416 ◽

2020 ◽

Vol 33 ◽

pp. 106416

Author(s):

Asset Daniyarov ◽

Askhat Molkenov ◽

Saule Rakhimova ◽

Ainur Akhmetova ◽

Zhannur Nurkina ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Genomic variant identification methods alter Mycobacterium tuberculosis transmission inference

10.1101/733642 ◽

2019 ◽

Cited By ~ 3

Author(s):

Katharine S. Walter ◽

Caroline Colijn ◽

Ted Cohen ◽

Barun Mathema ◽

Qingyun Liu ◽

...

Keyword(s):

Sequence Data ◽

Variant Calling ◽

Genomic Variation ◽

Human Pathogens ◽

Phylogenetic Structure ◽

Genomic Epidemiology ◽

Tuberculosis Transmission ◽

Epidemiology Studies ◽

Global And Local ◽

Variant Identification

AbstractPathogen genomic data are increasingly used to characterize global and local transmission patterns of important human pathogens and to inform public health interventions. Yet there is no current consensus on how to measure genomic variation. We investigated the effects of variant identification approaches on transmission inferences for M. tuberculosis by comparing variants identified by five different groups in the same sequence data from a clonal outbreak. We then measured the performance of commonly used variant calling approaches in recovering variation in a simulated tuberculosis outbreak and tested the effect of applying increasingly stringent filters on transmission inferences and phylogenies. We found that variant calling approaches used by different groups do not recover consistent sets of variants, often leading to conflicting transmission inferences. Further, performance in recovering true outbreak variation varied widely across approaches. Finally, stringent filters rapidly eroded the accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. We conclude that measurements of genetic distance and phylogenetic structure are dependent on variant calling approach. Variant calling algorithms trained upon true sequence data outperform other approaches and enable inclusion of repetitive regions typically excluded from genomic epidemiology studies, maximizing the information gleaned from outbreak genomes.

Download Full-text

Ethnically diverse urban transmission networks of Neisseria gonorrhoeae without evidence of HIV serosorting

Sexually Transmitted Infections ◽

10.1136/sextrans-2019-054025 ◽

2019 ◽

Vol 96 (2) ◽

pp. 106-109

Author(s):

Jayshree Dave ◽

John Paul ◽

Thomas Joshua Pasvol ◽

Andy Williams ◽

Fiona Warburton ◽

...

Keyword(s):

Neisseria Gonorrhoeae ◽

Ethnic Groups ◽

Antimicrobial Susceptibility ◽

Sequence Data ◽

Small Sample ◽

Whole Genome Sequence ◽

Whole Genome ◽

Sequencing Data ◽

Transmission Networks ◽

Hiv Serosorting

ObjectiveWe aimed to characterise gonorrhoea transmission patterns in a diverse urban population by linking genomic, epidemiological and antimicrobial susceptibility data.MethodsNeisseria gonorrhoeae isolates from patients attending sexual health clinics at Barts Health NHS Trust, London, UK, during an 11-month period underwent whole-genome sequencing and antimicrobial susceptibility testing. We combined laboratory and patient data to investigate the transmission network structure.ResultsOne hundred and fifty-eight isolates from 158 patients were available with associated descriptive data. One hundred and twenty-nine (82%) patients identified as male and 25 (16%) as female; four (3%) records lacked gender information. Self-described ethnicities were: 51 (32%) English/Welsh/Scottish; 33 (21%) white, other; 23 (15%) black British/black African/black, other; 12 (8%) Caribbean; 9 (6%) South Asian; 6 (4%) mixed ethnicity; and 10 (6%) other; data were missing for 14 (9%). Self-reported sexual orientations were 82 (52%) men who have sex with men (MSM); 49 (31%) heterosexual; 2 (1%) bisexual; data were missing for 25 individuals. Twenty-two (14%) patients were HIV positive. Whole-genome sequence data were generated for 151 isolates, which linked 75 (50%) patients to at least one other case. Using sequencing data, we found no evidence of transmission networks related to specific ethnic groups (p=0.64) or of HIV serosorting (p=0.35). Of 82 MSM/bisexual patients with sequencing data, 45 (55%) belonged to clusters of ≥2 cases, compared with 16/44 (36%) heterosexuals with sequencing data (p=0.06).ConclusionWe demonstrate links between 50% of patients in transmission networks using a relatively small sample in a large cosmopolitan city. We found no evidence of HIV serosorting. Our results do not support assortative selectivity as an explanation for differences in gonorrhoea incidence between ethnic groups.

Download Full-text

MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates

PeerJ ◽

10.7717/peerj.5895 ◽

2018 ◽

Vol 6 ◽

pp. e5895 ◽

Cited By ~ 35

Author(s):

Thomas Andreas Kohl ◽

Christian Utpatel ◽

Viola Schleusener ◽

Maria Rosaria De Filippo ◽

Patrick Beckert ◽

...

Keyword(s):

Antibiotic Resistance ◽

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Phylogenomic Analysis ◽

Whole Genome ◽

Sequencing Data ◽

Desktop Computer

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.

Download Full-text

Rapid statistical methods for inferring intra- and inter-hospital transmission of nosocomial pathogens from whole genome sequence data

10.1101/442319 ◽

2018 ◽

Cited By ~ 1

Author(s):

Marianne Aspbury ◽

James Sciberras ◽

Jukka Corander ◽

Sion C. Bayliss ◽

Tjibbe Donker ◽

...

Keyword(s):

Statistical Methods ◽

Genome Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Bacterial Pathogens ◽

Whole Genome Sequence ◽

Whole Genome ◽

Transmission Networks ◽

Genome Sequence Data ◽

Hospital Transmission

AbstractWhole genome sequence (WGS) data for bacterial pathogens can provide evidence as to the source of nosocomial infection, and more specifically the ability to distinguish between intra- and inter-hospital transmission. This is currently achieved either through using SNP thresholds, which can lack statistical robustness, or by constructing phylogenetic trees, which can be computationally expensive and difficult to interpret. Here we compare two alternative statistical approaches using 1022 genomes of methicillin resistantStaphylococcus aureus(MRSA) clone ST22. In 71% of cases both methods predict the same hospital origin, which is also supported by the ML tree. Robust assignments are divided approximately equally between intra-hospital transmission and inter-hospital transmission. Our approaches are rapid and produce intuitive output that could inform on immediate infection control priorities, as well as providing long-term data on inter-hospital transmission networks. We discuss the strengths and weakness of our methods, and the generalisability of this approach.One Sentence SummaryWe present rapid statistical methods for distinguishing intra- versus inter-hospital transmission of bacterial pathogens using whole genome sequence data; these methods do not require the use of SNP thresholds or the generation and interpretation of phylogenetic trees.

Download Full-text

Bayesian inference of infectious disease transmission from whole genome sequence data

10.1101/001388 ◽

2013 ◽

Cited By ~ 1

Author(s):

Xavier Didelot ◽

Jennifer Gardy ◽

Caroline Colijn

Keyword(s):

Disease Transmission ◽

Sequence Data ◽

Disease Outbreaks ◽

Genomic Data ◽

Realistic Model ◽

Host Population ◽

Whole Genome Sequence ◽

Genomic Epidemiology ◽

Starting Point ◽

Source Case

Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered -- how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely-sampled outbreaks from genomic data whilst considering within-host diversity. We infer a time-labelled phylogeny using BEAST, then infer a transmission network via a Monte-Carlo Markov Chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology, but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Download Full-text

The whole genome sequence data analyses of a Mycobacterium tuberculosis strain SBH321 isolated in Sabah, Malaysia, belongs to Ural family of Lineage 4

Data in Brief ◽

10.1016/j.dib.2020.106388 ◽

2020 ◽

Vol 33 ◽

pp. 106388

Author(s):

Jaeyres Jani ◽

Zainal Arifin Mustapha ◽

Chin Kai Ling ◽

Amabel Seow Ming Hui ◽

Roddy Teo ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Tuberculosis Strain ◽

Genome Sequence Data ◽

Data Analyses ◽

Mycobacterium Tuberculosis Strain

Download Full-text

Multiple Merger Genealogies in Outbreaks of Mycobacterium tuberculosis

Molecular Biology and Evolution ◽

10.1093/molbev/msaa179 ◽

2020 ◽

Vol 38 (1) ◽

pp. 290-306 ◽

Cited By ~ 1

Author(s):

Fabrizio Menardo ◽

Sébastien Gagneux ◽

Fabian Freund

Keyword(s):

Mycobacterium Tuberculosis ◽

Reproductive Success ◽

Sequence Data ◽

Model Misspecification ◽

Null Model ◽

Whole Genome Sequence ◽

Data Sets ◽

Host Immune System ◽

Demographic Inference ◽

Approximate Bayesian

Abstract The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders). Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.

Download Full-text