scholarly journals Estimating epidemic incidence and prevalence from genomic data

2017 ◽  
Author(s):  
Timothy G. Vaughan ◽  
Gabriel E. Leventhal ◽  
David A. Rasmussen ◽  
Alexei J. Drummond ◽  
David Welch ◽  
...  

AbstractModern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death or behaviour change). Birth-death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models, and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth-death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.

2019 ◽  
Vol 36 (8) ◽  
pp. 1804-1816 ◽  
Author(s):  
Timothy G Vaughan ◽  
Gabriel E Leventhal ◽  
David A Rasmussen ◽  
Alexei J Drummond ◽  
David Welch ◽  
...  

Abstract Modern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death, or behavior change). Birth–death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here, we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth–death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.


2020 ◽  
Vol 69 (5) ◽  
pp. 973-986 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G Vaughan ◽  
Tanja Stadler

Abstract Heterogeneous populations can lead to important differences in birth and death rates across a phylogeny. Taking this heterogeneity into account is necessary to obtain accurate estimates of the underlying population dynamics. We present a new multitype birth–death model (MTBD) that can estimate lineage-specific birth and death rates. This corresponds to estimating lineage-dependent speciation and extinction rates for species phylogenies, and lineage-dependent transmission and recovery rates for pathogen transmission trees. In contrast with previous models, we do not presume to know the trait driving the rate differences, nor do we prohibit the same rates from appearing in different parts of the phylogeny. Using simulated data sets, we show that the MTBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a reanalysis of two empirical data sets and compare the results obtained by MTBD and by the existing software BAMM. We compare two implementations of the model, one exact and one approximate (assuming that no rate changes occur in the extinct parts of the tree), and show that the approximation only slightly affects results. The MTBD model is implemented as a package in the Bayesian inference software BEAST 2 and allows joint inference of the phylogeny and the model parameters.[Birth–death; lineage specific rates, multi-type model.]


2018 ◽  
Vol 35 (11) ◽  
pp. 1852-1861 ◽  
Author(s):  
Niema Moshiri ◽  
Manon Ragonnet-Cronin ◽  
Joel O Wertheim ◽  
Siavash Mirarab

Abstract Motivation The ability to simulate epidemics as a function of model parameters allows insights that are unobtainable from real datasets. Further, reconstructing transmission networks for fast-evolving viruses like Human Immunodeficiency Virus (HIV) may have the potential to greatly enhance epidemic intervention, but transmission network reconstruction methods have been inadequately studied, largely because it is difficult to obtain ‘truth’ sets on which to test them and properly measure their performance. Results We introduce FrAmework for VIral Transmission and Evolution Simulation (FAVITES), a robust framework for simulating realistic datasets for epidemics that are caused by fast-evolving pathogens like HIV. FAVITES creates a generative model to produce contact networks, transmission networks, phylogenetic trees and sequence datasets, and to add error to the data. FAVITES is designed to be extensible by dividing the generative model into modules, each of which is expressed as a fixed API that can be implemented using various models. We use FAVITES to simulate HIV datasets and study the realism of the simulated datasets. We then use the simulated data to study the impact of the increased treatment efforts on epidemiological outcomes. We also study two transmission network reconstruction methods and their effectiveness in detecting fast-growing clusters. Availability and implementation FAVITES is available at https://github.com/niemasd/FAVITES, and a Docker image can be found on DockerHub (https://hub.docker.com/r/niemasd/favites). Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G. Vaughan ◽  
Tanja Stadler

AbstractHeterogeneous populations can lead to important differences in birth and death rates across a phylogeny Taking this heterogeneity into account is thus critical to obtain accurate estimates of the underlying population dynamics. We present a new multi-state birth-death model (MSBD) that can estimate lineage-specific birth and death rates. For species phylogenies, this corresponds to estimating lineage-dependent speciation and extinction rates. Contrary to existing models, we do not require a prior hypothesis on a trait driving the rate differences and we allow the same rates to be present in different parts of the phylogeny. Using simulated datasets, we show that the MSBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a re-analysis of two empirical datasets and compare the results obtained by MSBD and by the existing software BAMM. The MSBD model is implemented as a package in the Bayesian inference software BEAST2, which allows joint inference of the phylogeny and the model parameters.Significance statementPhylogenetic trees can inform about the underlying speciation and extinction processes within a species clade. Many different factors, for instance environmental changes or morphological changes, can lead to differences in macroevolutionary dynamics within a clade. We present here a new multi-state birth-death (MSBD) model that can detect these differences and estimate both the position of changes in the tree and the associated macroevolutionary parameters. The MSBD model does not require a prior hypothesis on which trait is driving the changes in dynamics and is thus applicable to a wide range of datasets. It is implemented as an extension to the existing framework BEAST2.


2013 ◽  
Vol 06 (02) ◽  
pp. 1350008 ◽  
Author(s):  
ANTTI SOLONEN ◽  
HEIKKI HAARIO ◽  
JEAN MICHEL TCHUENCHE ◽  
HERIETH RWEZAURA

Studying different theoretical properties of epidemiological models has been widely addressed, while numerical studies and especially the calibration of models, which are often complicated and loaded with a high number of unknown parameters, against measured data have received less attention. In this paper, we describe how a combination of simulated data and Markov Chain Monte Carlo (MCMC) methods can be used to study the identifiability of model parameters with different type of measurements. Three known models are used as case studies to illustrate the importance of parameter identifiability: a basic SIR model, an influenza model with vaccination and treatment and a HIV–Malaria co-infection model. The analysis reveals that calibration of complex models commonly studied in mathematical epidemiology, such as the HIV–Malaria co-dynamics model, can be difficult or impossible, even if the system would be fully observed. The presented approach provides a tool for design and optimization of real-life field campaigns of collecting data, as well as for model selection.


2018 ◽  
Author(s):  
Niema Moshiri ◽  
Manon Ragonnet-Cronin ◽  
Joel O. Wertheim ◽  
Siavash Mirarab

AbstractMotivationThe ability to simulate epidemics as a function of model parameters allows insights that are unobtainable from real datasets. Further, reconstructing transmission networks for fast-evolving viruses like HIV may have the potential to greatly enhance epidemic intervention, but transmission network reconstruction methods have been inadequately studied, largely because it is difficult to obtain “truth” sets on which to test them and properly measure their performance.ResultsWe introduce FAVITES, a robust framework for simulating realistic datasets for epidemics that are caused by fast-evolving pathogens like HIV. FAVITES creates a generative model to produce contact networks, transmission networks, phylogenetic trees, and sequence datasets, and to add error to the data. FAVITES is designed to be extensible by dividing the generative model into modules, each of which is expressed as a fixed API that can be implemented using various models. We use FAVITES to simulate HIV datasets and study the realism of the simulated datasets. We then use the simulated data to study the impact of the increased treatment efforts on epidemiological outcomes. We also study two transmission network reconstruction methods and their effectiveness in detecting fast-growing clusters.Availability and implementationFAVITES is available at https://github.com/niemasd/FAVITES, and a Docker image can be found on DockerHub (https://hub.docker.com/r/niemasd/favites).


2019 ◽  
Author(s):  
Sebastian Höhna ◽  
William A. Freyman ◽  
Zachary Nolen ◽  
John P. Huelsenbeck ◽  
Michael R. May ◽  
...  

AbstractSpecies richness varies considerably among the tree of life which can only be explained by heterogeneous rates of diversification (speciation and extinction). Previous approaches use phylogenetic trees to estimate branch-specific diversification rates. However, all previous approaches disregard diversification-rate shifts on extinct lineages although 99% of species that ever existed are now extinct. Here we describe a lineage-specific birth-death-shift process where lineages, both extant and extinct, may have heterogeneous rates of diversification. To facilitate probability computation we discretize the base distribution on speciation and extinction rates into k rate categories. The fixed number of rate categories allows us to extend the theory of state-dependent speciation and extinction models (e.g., BiSSE and MuSSE) to compute the probability of an observed phylogeny given the set of speciation and extinction rates. To estimate branch-specific diversification rates, we develop two independent and theoretically equivalent approaches: numerical integration with stochastic character mapping and data-augmentation with reversible-jump Markov chain Monte Carlo sampling. We validate the implementation of the two approaches in RevBayes using simulated data and an empirical example study of primates. In the empirical example, we show that estimates of the number of diversification-rate shifts are, unsurprisingly, very sensitive to the choice of prior distribution. Instead, branch-specific diversification rate estimates are less sensitive to the assumed prior distribution on the number of diversification-rate shifts and consistently infer an increased rate of diversification for Old World Monkeys. Additionally, we observe that as few as 10 diversification-rate categories are sufficient to approximate a continuous base distribution on diversification rates. In conclusion, our implementation of the lineage-specific birth-death-shift model in RevBayes provides biologists with a method to estimate branch-specific diversification rates under a mathematically consistent model.


2018 ◽  
Author(s):  
Josephine Ann Urquhart ◽  
Akira O'Connor

Receiver operating characteristics (ROCs) are plots which provide a visual summary of a classifier’s decision response accuracy at varying discrimination thresholds. Typical practice, particularly within psychological studies, involves plotting an ROC from a limited number of discrete thresholds before fitting signal detection parameters to the plot. We propose that additional insight into decision-making could be gained through increasing ROC resolution, using trial-by-trial measurements derived from a continuous variable, in place of discrete discrimination thresholds. Such continuous ROCs are not yet routinely used in behavioural research, which we attribute to issues of practicality (i.e. the difficulty of applying standard ROC model-fitting methodologies to continuous data). Consequently, the purpose of the current article is to provide a documented method of fitting signal detection parameters to continuous ROCs. This method reliably produces model fits equivalent to the unequal variance least squares method of model-fitting (Yonelinas et al., 1998), irrespective of the number of data points used in ROC construction. We present the suggested method in three main stages: I) building continuous ROCs, II) model-fitting to continuous ROCs and III) extracting model parameters from continuous ROCs. Throughout the article, procedures are demonstrated in Microsoft Excel, using an example continuous variable: reaction time, taken from a single-item recognition memory. Supplementary MATLAB code used for automating our procedures is also presented in Appendix B, with a validation of the procedure using simulated data shown in Appendix C.


Author(s):  
Leila Taghizadeh ◽  
Ahmad Karimi ◽  
Clemens Heitzinger

AbstractThe main goal of this paper is to develop the forward and inverse modeling of the Coronavirus (COVID-19) pandemic using novel computational methodologies in order to accurately estimate and predict the pandemic. This leads to governmental decisions support in implementing effective protective measures and prevention of new outbreaks. To this end, we use the logistic equation and the SIR system of ordinary differential equations to model the spread of the COVID-19 pandemic. For the inverse modeling, we propose Bayesian inversion techniques, which are robust and reliable approaches, in order to estimate the unknown parameters of the epidemiological models. We use an adaptive Markov-chain Monte-Carlo (MCMC) algorithm for the estimation of a posteriori probability distribution and confidence intervals for the unknown model parameters as well as for the reproduction number. Furthermore, we present a fatality analysis for COVID-19 in Austria, which is also of importance for governmental protective decision making. We perform our analyses on the publicly available data for Austria to estimate the main epidemiological model parameters and to study the effectiveness of the protective measures by the Austrian government. The estimated parameters and the analysis of fatalities provide useful information for decision makers and makes it possible to perform more realistic forecasts of future outbreaks.


2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


Sign in / Sign up

Export Citation Format

Share Document