scholarly journals Multiple Merger Genealogies in Outbreaks of Mycobacterium tuberculosis

2020 ◽  
Vol 38 (1) ◽  
pp. 290-306 ◽  
Author(s):  
Fabrizio Menardo ◽  
Sébastien Gagneux ◽  
Fabian Freund

Abstract The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders). Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.

2019 ◽  
Author(s):  
F. Menardo ◽  
S. Gagneux ◽  
F. Freund

AbstractThe Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared to the average (“super-spreaders”).Here we used an Approximate Bayesian Computation approach to test whether multiple merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered eleven publicly available whole genome sequence data sets sampled from local MTB populations and outbreaks, and found that MMC had a better fit compared to the Kingman coalescent for ten of the eleven data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed, and that past findings based on the Kingman coalescent need to be revisited.


Data in Brief ◽  
2020 ◽  
Vol 33 ◽  
pp. 106416
Author(s):  
Asset Daniyarov ◽  
Askhat Molkenov ◽  
Saule Rakhimova ◽  
Ainur Akhmetova ◽  
Zhannur Nurkina ◽  
...  

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5895 ◽  
Author(s):  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
Viola Schleusener ◽  
Maria Rosaria De Filippo ◽  
Patrick Beckert ◽  
...  

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.


2020 ◽  
Author(s):  
Erik R Funk ◽  
Garth M Spellman ◽  
Kevin Winker ◽  
Jack J Withrow ◽  
Kristen C Ruegg ◽  
...  

Abstract Understanding how gene flow affects population divergence and speciation remains challenging. Differentiating one evolutionary process from another can be difficult because multiple processes can produce similar patterns, and more than one process can occur simultaneously. Although simple population models produce predictable results, how these processes balance in taxa with patchy distributions and complicated natural histories is less certain. These types of populations might be highly connected through migration (gene flow), but can experience stronger effects of genetic drift and inbreeding, or localized selection. Although different signals can be difficult to separate, the application of high-throughput sequence data can provide the resolution necessary to distinguish many of these processes. We present whole-genome sequence data for an avian species group with an alpine and arctic tundra distribution to examine the role that different population genetic processes have played in their evolutionary history. Rosy-finches inhabit high elevation mountaintop sky islands and high-latitude island and continental tundra. They exhibit extensive plumage variation coupled with low levels of genetic variation. Additionally, the number of species within the complex is debated, making them excellent for studying the forces involved in the process of diversification, as well as an important species group in which to investigate species boundaries. Total genomic variation suggests a broadly continuous pattern of allele frequency changes across the mainland taxa of this group in North America. However, phylogenomic analyses recover multiple distinct, well supported, groups that coincide with previously described morphological variation and current species-level taxonomy. Tests of introgression using D-statistics and approximate Bayesian computation reveal significant levels of introgression between multiple North American taxa. These results provide insight into the balance between divergent and homogenizing population genetic processes and highlight remaining challenges in interpreting conflict between different types of analytical approaches with whole-genome sequence data. [ABBA-BABA; approximate Bayesian computation; gene flow; phylogenomics; speciation; whole-genome sequencing.]


2016 ◽  
Author(s):  
Sebastian Duchêne ◽  
Kathryn E. Holt ◽  
François-Xavier Weill ◽  
Simon Le Hello ◽  
Jane Hawkey ◽  
...  

ABSTRACTEstimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations, and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with “ancient DNA” data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from 10−6 to 10−8 nucleotide substitutions site-1 year-1. This variation was largely attributable to sampling time, which was strongly negatively associated with estimated evolutionary rates, with this relationship best described by an exponential decay curve. To avoid potential estimation biases such time-dependency should be considered when inferring evolutionary time-scales in bacteria.


2018 ◽  
Author(s):  
Flora Jay ◽  
Simon Boitard ◽  
Frédéric Austerlitz

AbstractSpecies generally undergo a complex demographic history, consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large datasets. Here we designed an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation (ABC), a simulation-based statistical framework that allows (i) identifying the best demographic scenario among several competing scenarios, and (ii) estimating the best-fitting parameters under the chosen scenario. ABC relies on the computation of summary statistics. Using a cross-validation approach, we showed that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (eg heterozygosity, Tajima’s D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrated the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally showed that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion was the most relevant for Eurasian populations.


2020 ◽  
Author(s):  
Francisco J. Pérez-Reche ◽  
Ovidiu Rotariu ◽  
Bruno S. Lopes ◽  
Ken J. Forbes ◽  
Norval J.C. Strachan

ABSTRACTWhole genome sequence (WGS) data could transform our ability to attribute individuals to source populations. However, methods that effectively mine these data are yet to be developed. We present a minimal multilocus distance (MMD) method which rapidly deals with these large data sets as well as methods for optimally selecting loci. This was applied on WGS data to determine the source of human campylobacteriosis, the geographical origin of diverse biological species including humans and proteomic data to classify breast cancer tumours. The MMD method provides a highly accurate attribution which is computationally efficient for extended genotypes. These methods are generic, easy to implement for WGS and proteomic data and have wide application.


2019 ◽  
Author(s):  
Kayzad Nilgiriwala ◽  
Vidushi Chitalia ◽  
Sanchi Shah ◽  
Akshata Papewar

ABSTRACTToxin-antitoxin (TA) modules are one of the prominent determinants that triggers a persistent state aiding Mycobacterium tuberculosis evasion to host generated stresses. The 79 characterized and putative TA systems described in M. tuberculosis are dominated by the VapBC, MazEF, HigAB, RelBE and ParDE TA families, largely involved in persistence and cell arrest. Hence, there is a need to maintain and conserve the TA loci in the chromosome of the pathogen. It is essential to study the genomic differences of the TA systems in clinical isolates along with its association to drug susceptibility patterns and lineage. In the current study, the TA loci and their promoter sequences were analysed from the whole genome sequence data of 74 clinical isolates. Mykrobe Predictor was used for lineage identification and drug resistance predictions in the clinical isolates. Polymorphisms associated with 79.8% (63/79) TA systems were observed across 72 clinical isolates. Among the TA systems, the isolates had a varying number of polymorphisms localised primarily in the toxin genes (58.7%), antitoxin genes (40.7%) and chaperones (0.6%), due to Single Nucleotide Polymorphism (SNP) resulting in transition (67.3%), transversion or frameshift mutations. Our analysis suggests the presence of novel Phylo-SNPs by establishing high confidence association of specific lineages to polymorphisms in the TA systems. Notably, association of polymorphisms in Rv1838c-1839c (VapBC13), Rv3358-3357 (YefM/YoeB) and Rv0240-0239 (VapBC24) to Delhi/Central Asia lineage. The polymorphic loci of the 3 TA systems is localised in the antitoxin gene of the Delhi/Central Asia strains, with a resultant silent mutation. The assessment of correlation between TA polymorphisms and the drug resistance profile revealed correlation of SNPs in VapBC35 with drug resistant M. tuberculosis strains and SNPs in VapBC24, VapBC13 and YefM/YoeB to drug sensitive strains.


Sign in / Sign up

Export Citation Format

Share Document