scholarly journals Genome-scale rates of evolutionary change in bacteria

2016 ◽  
Author(s):  
Sebastian Duchêne ◽  
Kathryn E. Holt ◽  
François-Xavier Weill ◽  
Simon Le Hello ◽  
Jane Hawkey ◽  
...  

ABSTRACTEstimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations, and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with “ancient DNA” data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from 10−6 to 10−8 nucleotide substitutions site-1 year-1. This variation was largely attributable to sampling time, which was strongly negatively associated with estimated evolutionary rates, with this relationship best described by an exponential decay curve. To avoid potential estimation biases such time-dependency should be considered when inferring evolutionary time-scales in bacteria.

2017 ◽  
Author(s):  
K. Jun Tong ◽  
David A. Duchêne ◽  
Sebastián Duchêne ◽  
Jemma L. Geoghegan ◽  
Simon Y.W. Ho

AbstractThe estimation of evolutionary rates from ancient DNA sequences can be negatively affected by among-lineage rate variation and non-random sampling. Using a simulation study, we compared the performance of three phylogenetic methods for inferring evolutionary rates from time-structured data sets: root-to-tip regression, least-squares dating, and Bayesian inference. Our results show that these methods produce reliable estimates when the substitution rate is high, rate variation is low, and samples of similar ages are not phylogenetically clustered. The interaction of these factors is particularly important for Bayesian estimation of evolutionary rates. We also inferred rates for time-structured mitogenomic data sets from six vertebrate species. Root-to-tip regression estimated a different rate from least-squares dating and Bayesian inference for mitogenomes from the horse, which has high levels of among-lineage rate variation. We recommend using multiple methods of inference and testing data for temporal signal, among-lineage rate variation, and phylo-temporal clustering.


2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Sebastian Duchene ◽  
Leo Featherstone ◽  
Melina Haritopoulou-Sinanidou ◽  
Andrew Rambaut ◽  
Philippe Lemey ◽  
...  

Abstract The ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at eight different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by 2 February 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 and 122 genomes, converged at an evolutionary rate of about 1.1 × 10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.


2015 ◽  
Author(s):  
Remco Bouckaert ◽  
Peter Lockhart

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.


2018 ◽  
Author(s):  
John Palmer ◽  
Art Poon

The transmission and pathogenesis of human immunodeficiency virus type 1 (HIV-1) is disproportionately influenced by evolution in the five variable regions of the virus surface envelope glycoprotein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change in these regions. However, the influx of indels relative to nucleotide substitutions has not yet been quantified through a comparative analysis of HIV-1 sequence data. Here we develop and report results from a phylogenetic method to estimate indel rates for the gp120 variable regions across five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We processed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences for phylogenetic analysis. In brief, our method employs maximum likelihood to reconstruct phylogenies scaled in time and fits a Poisson model to the observed distribution of indels between closely related pairs of sequences in the tree (cherries). The rate estimates ranged from 3.0e-5 to 1.5e-3 indels/nt/year and varied significantly among variable regions and subtypes. Indel rates were significantly lower in the region encoding variable loop V3, and also lower for HIV-1 subtype B relative to other subtypes. We also found that variable loops V1, V2 and V4 tended to accumulate significantly longer indels. Further, we observed that the nucleotide composition of indel sequences was significantly distinct from that of the flanking sequence in HIV-1 gp120. Indels affected potential N-linked glycosylation sites substantially more often in V1 and V2 than expected by chance, which is consistent with positive selection on glycosylation patterns within these regions of gp120. These results represent the first comprehensive measures of indel rates in HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for further research in the molecular evolution of HIV-1.


2020 ◽  
Author(s):  
Francisco J. Pérez-Reche ◽  
Ovidiu Rotariu ◽  
Bruno S. Lopes ◽  
Ken J. Forbes ◽  
Norval J.C. Strachan

ABSTRACTWhole genome sequence (WGS) data could transform our ability to attribute individuals to source populations. However, methods that effectively mine these data are yet to be developed. We present a minimal multilocus distance (MMD) method which rapidly deals with these large data sets as well as methods for optimally selecting loci. This was applied on WGS data to determine the source of human campylobacteriosis, the geographical origin of diverse biological species including humans and proteomic data to classify breast cancer tumours. The MMD method provides a highly accurate attribution which is computationally efficient for extended genotypes. These methods are generic, easy to implement for WGS and proteomic data and have wide application.


2020 ◽  
Vol 38 (1) ◽  
pp. 290-306 ◽  
Author(s):  
Fabrizio Menardo ◽  
Sébastien Gagneux ◽  
Fabian Freund

Abstract The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders). Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.


2020 ◽  
Vol 37 (11) ◽  
pp. 3363-3379 ◽  
Author(s):  
Sebastian Duchene ◽  
Philippe Lemey ◽  
Tanja Stadler ◽  
Simon Y W Ho ◽  
David A Duchene ◽  
...  

Abstract Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.


2016 ◽  
Author(s):  
J. Arvid Ågren ◽  
Hui-Run Huang ◽  
Stephen I. Wright

AbstractPremise of the studyShifts in ploidy affect the evolutionary dynamics of genomes in a myriad of ways. Population genetic theory predicts that transposable element (TE) proliferation may follow because the genome wide efficacy of selection should be reduced and the increase in gene copies may mask the deleterious effects of TE insertions. Moreover, in allopolyploids TEs may further accumulate because of hybrid breakdown of TE silencing. However, to date the evidence of TE proliferation following an increase in ploidy is mixed, and the relative importance of relaxed selection vs. silencing breakdown remains unclear.MethodsWe used high-coverage whole genome sequence data to evaluate the abundance, genomic distribution, and population frequencies of TEs in the self-fertilizing recent allotetraploid Capsella bursa-pastoris (Brassicaceae). We then compared the C. bursa-pastoris TE profile with that of its two parental diploid species, outcrossing C. grandiflora and self-fertilizing C. orientalis.Key resultsWe found no evidence that C. bursa-pastoris has experienced a large genome wide proliferation of TEs relative to its parental species. However, when centromeric regions are excluded, we find evidence of significantly higher abundance of retrotransposons in C. bursa-pastoris along the gene-rich chromosome arms, compared to C.grandiflora and C. orientalis.ConclusionsThe lack of a genome-wide effect of allopolyploidy on TE abundance, combined with the increases TE abundance in gene-rich regions suggest that relaxed selection rather than hybrid breakdown of host silencing explains the TE accumulation in C. bursa-pastoris


Author(s):  
Sebastian Duchene ◽  
Leo Featherstone ◽  
Melina Haritopoulou-Sinanidou ◽  
Andrew Rambaut ◽  
Philippe Lemey ◽  
...  

AbstractThe ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real-time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at 8 different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by February 2nd 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 to 122 genomes, converged at an evolutionary rate of about 1.1×10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.


2021 ◽  
Author(s):  
Joel T. Nelson ◽  
Omar E. Cornejo ◽  

AbstractRecombination is one of the main evolutionary mechanisms responsible for changing the genomic architecture of populations; and in essence, it is the main mechanism by which novel combinations of alleles, haplotypes, are formed. A clear picture that has emerged across study systems is that recombination is highly variable, even among closely related species. However, it is only until very recently that we have started to understand how recombination variation between populations of the same species impact genetic diversity and divergence. Here, we used whole-genome sequence data to build fine-scale recombination maps for nine populations within two species of Anopheles, Anopheles gambiae and Anopheles coluzzii. The genome-wide recombination averages were on the same order of magnitude for all populations except one. Yet, we identified significant differences in fine-scale recombination rates among all population comparisons. We report that effective population sizes, and presence of a chromosomal inversion has major contribution to recombination rate variation along the genome and across populations. We identified over 400 highly variable recombination hotspots across all populations, where only 9.6% are shared between two or more populations. Additionally, our results are consistent with recombination hotspots contributing to both genetic diversity and absolute divergence (dxy) between populations and species of Anopheles. However, we also show that recombination has a small impact on population genetic differentiation as estimated with FST. The minimal impact that recombination has on genetic differentiation across populations represents the first empirical evidence against recent theoretical work suggesting that variation in recombination along the genome can mask or impair our ability to detect signatures of selection. Our findings add new understanding to how recombination rates vary within species, and how this major evolutionary mechanism can maintain and contribute to genetic variation and divergence within a prominent malaria vector.


Sign in / Sign up

Export Citation Format

Share Document