scholarly journals Temporal signal and the phylodynamic threshold of SARS-CoV-2

Author(s):  
Sebastian Duchene ◽  
Leo Featherstone ◽  
Melina Haritopoulou-Sinanidou ◽  
Andrew Rambaut ◽  
Philippe Lemey ◽  
...  

AbstractThe ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real-time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at 8 different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by February 2nd 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 to 122 genomes, converged at an evolutionary rate of about 1.1×10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.

2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Sebastian Duchene ◽  
Leo Featherstone ◽  
Melina Haritopoulou-Sinanidou ◽  
Andrew Rambaut ◽  
Philippe Lemey ◽  
...  

Abstract The ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at eight different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by 2 February 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 and 122 genomes, converged at an evolutionary rate of about 1.1 × 10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.


2010 ◽  
Vol 84 (14) ◽  
pp. 7412-7415 ◽  
Author(s):  
Helena Skar ◽  
Pedro Borrego ◽  
Timothy C. Wallstrom ◽  
Mattias Mild ◽  
José Maria Marcelino ◽  
...  

ABSTRACT The objective of this study was to estimate and compare the evolutionary rates of HIV-2 and HIV-1. Two HIV-2 data sets from patients with advanced disease were compared to matched HIV-1 data sets. The estimated mean evolutionary rate of HIV-2 was significantly higher than the estimated rate of HIV-1, both in the gp125 and in the V3 region of the env gene. In addition, the rate of synonymous substitutions in gp125 was significantly higher for HIV-2 than for HIV-1, possibly indicating a shorter generation time or higher mutation rate of HIV-2. Thus, the lower virulence of HIV-2 does not appear to translate into a lower rate of evolution.


2016 ◽  
Author(s):  
Sebastian Duchêne ◽  
Kathryn E. Holt ◽  
François-Xavier Weill ◽  
Simon Le Hello ◽  
Jane Hawkey ◽  
...  

ABSTRACTEstimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations, and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with “ancient DNA” data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from 10−6 to 10−8 nucleotide substitutions site-1 year-1. This variation was largely attributable to sampling time, which was strongly negatively associated with estimated evolutionary rates, with this relationship best described by an exponential decay curve. To avoid potential estimation biases such time-dependency should be considered when inferring evolutionary time-scales in bacteria.


2019 ◽  
Vol 35 (22) ◽  
pp. 4815-4817 ◽  
Author(s):  
Amanda Kowalczyk ◽  
Wynn K Meyer ◽  
Raghavendran Partha ◽  
Weiguang Mao ◽  
Nathan L Clark ◽  
...  

Abstract Motivation When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits. Results Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results. Availability and implementation RERconverge source code, documentation and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila and yeast are available at https://bit.ly/2J2QBnj. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
K. Jun Tong ◽  
David A. Duchêne ◽  
Sebastián Duchêne ◽  
Jemma L. Geoghegan ◽  
Simon Y.W. Ho

AbstractThe estimation of evolutionary rates from ancient DNA sequences can be negatively affected by among-lineage rate variation and non-random sampling. Using a simulation study, we compared the performance of three phylogenetic methods for inferring evolutionary rates from time-structured data sets: root-to-tip regression, least-squares dating, and Bayesian inference. Our results show that these methods produce reliable estimates when the substitution rate is high, rate variation is low, and samples of similar ages are not phylogenetically clustered. The interaction of these factors is particularly important for Bayesian estimation of evolutionary rates. We also inferred rates for time-structured mitogenomic data sets from six vertebrate species. Root-to-tip regression estimated a different rate from least-squares dating and Bayesian inference for mitogenomes from the horse, which has high levels of among-lineage rate variation. We recommend using multiple methods of inference and testing data for temporal signal, among-lineage rate variation, and phylo-temporal clustering.


2016 ◽  
Author(s):  
Jose Aguilar-Rodriguez ◽  
Beatriz Sabater-Munoz ◽  
Victor Berlanga ◽  
David Alvarez-Ponce ◽  
Andreas Wagner ◽  
...  

Molecular chaperones, also known as heat-shock proteins, refold misfolded proteins and help other proteins reach their native conformation. Thanks to these abilities, some chaperones, such as the Hsp90 protein or the chaperonin GroEL, can buffer the deleterious phenotypic effects of mutations that alter protein structure and function. Hsp70 chaperones use a chaperoning mechanism different from Hsp90 and GroEL, and it is not known whether they can also buffer mutations. Here, we show that they can. To this end, we performed a mutation accumulation experiment inEscherichia coli, followed by whole-genome resequencing. Our sequence data shows that overexpression of the Hsp70 chaperone DnaK increases the tolerance of its clients for nonsynonymous nucleotide substitutions and nucleotide insertions and deletions. We also show that this elevated mutational buffering on short evolutionary time scales translates into differences in evolutionary rates on intermediate and long evolutionary time scales. To this end, we compared the evolutionary rates of DnaK clients and nonclients using the genomes ofE. coli,Salmonella typhimurium, and 83 other gamma-proteobacterial species. We find that clients that interact strongly with DnaK evolve faster than weakly interacting clients. Our results imply that all three major chaperone classes can buffer mutations and affect protein evolution. They illustrate how an individual protein like a chaperone can have a disproportionate effect on proteome evolution.


2018 ◽  
Author(s):  
Amanda Kowalczyk ◽  
Wynn K Meyer ◽  
Raghavendran Partha ◽  
Weiguang Mao ◽  
Nathan L Clark ◽  
...  

AbstractMotivation: When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits.Results: Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results.Availability: RERconverge source code, documentation, and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila, and yeast are available at https://bit.ly/2J2QBnj.Contact:[email protected] information: Supplementary information, containing detailed vignettes for usage of RERconverge, are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Eleanor F. Miller ◽  
Andrea Manica

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.


2021 ◽  
Vol 13 (13) ◽  
pp. 2559
Author(s):  
Daniele Cerra ◽  
Miguel Pato ◽  
Kevin Alonso ◽  
Claas Köhler ◽  
Mathias Schneider ◽  
...  

Spectral unmixing represents both an application per se and a pre-processing step for several applications involving data acquired by imaging spectrometers. However, there is still a lack of publicly available reference data sets suitable for the validation and comparison of different spectral unmixing methods. In this paper, we introduce the DLR HyperSpectral Unmixing (DLR HySU) benchmark dataset, acquired over German Aerospace Center (DLR) premises in Oberpfaffenhofen. The dataset includes airborne hyperspectral and RGB imagery of targets of different materials and sizes, complemented by simultaneous ground-based reflectance measurements. The DLR HySU benchmark allows a separate assessment of all spectral unmixing main steps: dimensionality estimation, endmember extraction (with and without pure pixel assumption), and abundance estimation. Results obtained with traditional algorithms for each of these steps are reported. To the best of our knowledge, this is the first time that real imaging spectrometer data with accurately measured targets are made available for hyperspectral unmixing experiments. The DLR HySU benchmark dataset is openly available online and the community is welcome to use it for spectral unmixing and other applications.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


Sign in / Sign up

Export Citation Format

Share Document