A Multitype Birth–Death Model for Bayesian Inference of Lineage-Specific Birth and Death Rates

Joëlle Barido-Sottani; Timothy G Vaughan; Tanja Stadler

doi:10.1093/sysbio/syaa016

A Multitype Birth–Death Model for Bayesian Inference of Lineage-Specific Birth and Death Rates

Systematic Biology ◽

10.1093/sysbio/syaa016 ◽

2020 ◽

Vol 69 (5) ◽

pp. 973-986 ◽

Cited By ~ 3

Author(s):

Joëlle Barido-Sottani ◽

Timothy G Vaughan ◽

Tanja Stadler

Keyword(s):

Bayesian Inference ◽

Simulated Data ◽

Pathogen Transmission ◽

Type Model ◽

Model Parameters ◽

Data Sets ◽

Death Rates ◽

Joint Inference ◽

Extinction Rates ◽

Birth Death

Abstract Heterogeneous populations can lead to important differences in birth and death rates across a phylogeny. Taking this heterogeneity into account is necessary to obtain accurate estimates of the underlying population dynamics. We present a new multitype birth–death model (MTBD) that can estimate lineage-specific birth and death rates. This corresponds to estimating lineage-dependent speciation and extinction rates for species phylogenies, and lineage-dependent transmission and recovery rates for pathogen transmission trees. In contrast with previous models, we do not presume to know the trait driving the rate differences, nor do we prohibit the same rates from appearing in different parts of the phylogeny. Using simulated data sets, we show that the MTBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a reanalysis of two empirical data sets and compare the results obtained by MTBD and by the existing software BAMM. We compare two implementations of the model, one exact and one approximate (assuming that no rate changes occur in the extinct parts of the tree), and show that the approximation only slightly affects results. The MTBD model is implemented as a package in the Bayesian inference software BEAST 2 and allows joint inference of the phylogeny and the model parameters.[Birth–death; lineage specific rates, multi-type model.]

Download Full-text

A Multi-State Birth-Death model for Bayesian inference of lineage-specific birth and death rates

10.1101/440982 ◽

2018 ◽

Cited By ~ 4

Author(s):

Joëlle Barido-Sottani ◽

Timothy G. Vaughan ◽

Tanja Stadler

Keyword(s):

Bayesian Inference ◽

Phylogenetic Trees ◽

Environmental Changes ◽

Morphological Changes ◽

Model Parameters ◽

Death Rates ◽

Joint Inference ◽

Wide Range ◽

Prior Hypothesis ◽

Birth Death

AbstractHeterogeneous populations can lead to important differences in birth and death rates across a phylogeny Taking this heterogeneity into account is thus critical to obtain accurate estimates of the underlying population dynamics. We present a new multi-state birth-death model (MSBD) that can estimate lineage-specific birth and death rates. For species phylogenies, this corresponds to estimating lineage-dependent speciation and extinction rates. Contrary to existing models, we do not require a prior hypothesis on a trait driving the rate differences and we allow the same rates to be present in different parts of the phylogeny. Using simulated datasets, we show that the MSBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a re-analysis of two empirical datasets and compare the results obtained by MSBD and by the existing software BAMM. The MSBD model is implemented as a package in the Bayesian inference software BEAST2, which allows joint inference of the phylogeny and the model parameters.Significance statementPhylogenetic trees can inform about the underlying speciation and extinction processes within a species clade. Many different factors, for instance environmental changes or morphological changes, can lead to differences in macroevolutionary dynamics within a clade. We present here a new multi-state birth-death (MSBD) model that can detect these differences and estimate both the position of changes in the tree and the associated macroevolutionary parameters. The MSBD model does not require a prior hypothesis on which trait is driving the changes in dynamics and is thus applicable to a wide range of datasets. It is implemented as an extension to the existing framework BEAST2.

Download Full-text

Bayesian Inference of Species Trees using Diffusion Models

Systematic Biology ◽

10.1093/sysbio/syaa051 ◽

2020 ◽

Vol 70 (1) ◽

pp. 145-161 ◽

Cited By ~ 1

Author(s):

Marnus Stoltz ◽

Boris Baeumer ◽

Remco Bouckaert ◽

Colin Fox ◽

Gordon Hiscott ◽

...

Keyword(s):

Bayesian Inference ◽

Numerical Algorithms ◽

Diffusion Models ◽

Model Parameters ◽

Data Sets ◽

Species Trees ◽

Computationally Efficient ◽

Data Set ◽

Snp Data ◽

Binary Markers

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Download Full-text

Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations

10.1101/035873 ◽

2016 ◽

Cited By ~ 2

Author(s):

Kassian Kobert ◽

Alexandros Stamatakis ◽

Tomáš Flouri

Keyword(s):

Evolutionary Biology ◽

Likelihood Function ◽

Simulated Data ◽

Evolutionary Model ◽

Identical Result ◽

Model Parameters ◽

Data Sets ◽

Efficient Detection ◽

Novel Method ◽

Computational Bottleneck

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.

Download Full-text

Bayesian Planet Searches for the 10 cm/s Radial Velocity Era

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316002817 ◽

2015 ◽

Vol 11 (A29A) ◽

pp. 205-207

Author(s):

Philip C. Gregory

Keyword(s):

Radial Velocity ◽

State Of The Art ◽

Simulated Data ◽

Model Parameters ◽

Data Sets ◽

Stellar Activity ◽

Bayesian Fusion ◽

Multiple State ◽

Simulated Data Sets ◽

Apodization Function

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.

Download Full-text

A New Double Truncated Generalized Gamma Model with Some Applications

Journal of Mathematics ◽

10.1155/2021/5500631 ◽

2021 ◽

Vol 2021 ◽

pp. 1-27

Author(s):

Awad A. Bakery ◽

Wael Zakaria ◽

OM Kalthum S. K. Mohamed

Keyword(s):

Likelihood Function ◽

Survival Function ◽

Gaussian Model ◽

Simulated Data ◽

Weibull Model ◽

Approximate Model ◽

Model Parameters ◽

Data Sets ◽

Gamma Model ◽

Generalized Gamma

The generalized Gamma model has been applied in a variety of research fields, including reliability engineering and lifetime analysis. Indeed, we know that, from the above, it is unbounded. Data have a bounded service area in a variety of applications. A new five-parameter bounded generalized Gamma model, the bounded Weibull model with four parameters, the bounded Gamma model with four parameters, the bounded generalized Gaussian model with three parameters, the bounded exponential model with three parameters, and the bounded Rayleigh model with two parameters, is presented in this paper as a special case. This approach to the problem, which utilizes a bounded support area, allows for a great deal of versatility in fitting various shapes of observed data. Numerous properties of the proposed distribution have been deduced, including explicit expressions for the moments, quantiles, mode, moment generating function, mean variance, mean residual lifespan, and entropies, skewness, kurtosis, hazard function, survival function, r th order statistic, and median distributions. The delivery has hazard frequencies that are monotonically increasing or declining, bathtub-shaped, or upside-down bathtub-shaped. We use the Newton Raphson approach to approximate model parameters that increase the log-likelihood function and some of the parameters have a closed iterative structure. Six actual data sets and six simulated data sets were tested to demonstrate how the proposed model works in reality. We illustrate why the Model is more stable and less affected by sample size. Additionally, the suggested model for wavelet histogram fitting of images and sounds is very accurate.

Download Full-text

Estimating Epidemic Incidence and Prevalence from Genomic Data

Molecular Biology and Evolution ◽

10.1093/molbev/msz106 ◽

2019 ◽

Vol 36 (8) ◽

pp. 1804-1816 ◽

Cited By ~ 6

Author(s):

Timothy G Vaughan ◽

Gabriel E Leventhal ◽

David A Rasmussen ◽

Alexei J Drummond ◽

David Welch ◽

...

Keyword(s):

Sierra Leone ◽

Particle Filtering ◽

Phylogenetic Trees ◽

Stochastic Dynamics ◽

Large Population ◽

Simulated Data ◽

Model Parameters ◽

Epidemiological Models ◽

Case Count ◽

Birth Death

Abstract Modern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death, or behavior change). Birth–death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here, we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth–death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.

Download Full-text

Personalized Dynamic Pricing with Machine Learning: High-Dimensional Features and Heterogeneous Elasticity

Management Science ◽

10.1287/mnsc.2020.3680 ◽

2021 ◽

Author(s):

Gah-Yi Ban ◽

N. Bora Keskin

Keyword(s):

Dynamic Pricing ◽

Simulated Data ◽

The United States ◽

Model Parameters ◽

Data Sets ◽

Demand Model ◽

Pricing Policy ◽

Data Set ◽

Customized Pricing ◽

The Individual

We consider a seller who can dynamically adjust the price of a product at the individual customer level, by utilizing information about customers’ characteristics encoded as a d-dimensional feature vector. We assume a personalized demand model, parameters of which depend on s out of the d features. The seller initially does not know the relationship between the customer features and the product demand but learns this through sales observations over a selling horizon of T periods. We prove that the seller’s expected regret, that is, the revenue loss against a clairvoyant who knows the underlying demand relationship, is at least of order [Formula: see text] under any admissible policy. We then design a near-optimal pricing policy for a semiclairvoyant seller (who knows which s of the d features are in the demand model) who achieves an expected regret of order [Formula: see text]. We extend this policy to a more realistic setting, where the seller does not know the true demand predictors, and show that this policy has an expected regret of order [Formula: see text], which is also near-optimal. Finally, we test our theory on simulated data and on a data set from an online auto loan company in the United States. On both data sets, our experimentation-based pricing policy is superior to intuitive and/or widely-practiced customized pricing methods, such as myopic pricing and segment-then-optimize policies. Furthermore, our policy improves upon the loan company’s historical pricing decisions by 47% in expected revenue over a six-month period. This paper was accepted by Noah Gans, stochastic models and simulation.

Download Full-text

A two-type branching process model of gene family evolution

10.1101/2021.03.18.435925 ◽

2021 ◽

Author(s):

Arthur Zwaenepoel ◽

Yves Van de Peer

Keyword(s):

Gene Family ◽

Process Model ◽

Branching Process ◽

Gene Loss ◽

Simulated Data ◽

Gene Families ◽

Gene Family Evolution ◽

Comparative Genomic ◽

Model Parameters ◽

Data Sets

AbstractPhylogenetic models of gene family evolution based on birth-death processes (BDPs) vide an awkward fit to comparative genomic data sets. A central assumption of these models is the constant per-gene loss rate in any particular family. Because of the possibility of partial functional redundancy among gene family members, gene loss dynamics are however likely to be dependent on the number of genes in a family, and different variations of commonly employed BDP models indeed suggest this is the case. We propose a simple two-type branching process model to better approximate the stochastic evolution of gene families by gene duplication and loss and perform Bayesian statistical inference of model parameters in a phylogenetic context. We evaluate the statistical methods using simulated data sets and apply the model to gene family data for Drosophila, yeasts and primates, providing new quantitative insights in the long-term maintenance of duplicated genes.

Download Full-text

Model-Based Geostatistics from a Bayesian Perspective: Investigating Area-to-Point Kriging with Small Data Sets

Mathematical Geosciences ◽

10.1007/s11004-019-09840-6 ◽

2019 ◽

Vol 52 (3) ◽

pp. 397-423

Author(s):

Luc Steinbuch ◽

Thomas G. Orton ◽

Dick J. Brus

Keyword(s):

Prior Distribution ◽

Crop Yields ◽

Simulated Data ◽

Monte Carlo Sampling ◽

Model Parameters ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Minimal Impact ◽

The Impact

AbstractArea-to-point kriging (ATPK) is a geostatistical method for creating high-resolution raster maps using data of the variable of interest with a much lower resolution. The data set of areal means is often considerably smaller ($$<\,50 $$<50 observations) than data sets conventionally dealt with in geostatistical analyses. In contemporary ATPK methods, uncertainty in the variogram parameters is not accounted for in the prediction; this issue can be overcome by applying ATPK in a Bayesian framework. Commonly in Bayesian statistics, posterior distributions of model parameters and posterior predictive distributions are approximated by Markov chain Monte Carlo sampling from the posterior, which can be computationally expensive. Therefore, a partly analytical solution is implemented in this paper, in order to (i) explore the impact of the prior distribution on predictions and prediction variances, (ii) investigate whether certain aspects of uncertainty can be disregarded, simplifying the necessary computations, and (iii) test the impact of various model misspecifications. Several approaches using simulated data, aggregated real-world point data, and a case study on aggregated crop yields in Burkina Faso are compared. The prior distribution is found to have minimal impact on the disaggregated predictions. In most cases with known short-range behaviour, an approach that disregards uncertainty in the variogram distance parameter gives a reasonable assessment of prediction uncertainty. However, some severe effects of model misspecification in terms of overly conservative or optimistic prediction uncertainties are found, highlighting the importance of model choice or integration into ATPK.

Download Full-text

Improved estimation of macroevolutionary rates from fossil data using a Bayesian framework

Paleobiology ◽

10.1017/pab.2019.23 ◽

2019 ◽

Vol 45 (4) ◽

pp. 546-570 ◽

Cited By ~ 15

Author(s):

Daniele Silvestro ◽

Nicolas Salamin ◽

Alexandre Antonelli ◽

Xavier Meyer

Keyword(s):

Temporal Variation ◽

Marine Mammals ◽

Simulated Data ◽

Large Data ◽

Bayesian Framework ◽

Monte Carlo Algorithm ◽

Alternative Methods ◽

Data Sets ◽

Data Set ◽

Extinction Rates

AbstractThe estimation of origination and extinction rates and their temporal variation is central to understanding diversity patterns and the evolutionary history of clades. The fossil record provides the only direct evidence of extinction and biodiversity changes through time and has long been used to infer the dynamics of diversity changes in deep time. The software PyRate implements a Bayesian framework to analyze fossil occurrence data to estimate the rates of preservation, origination, and extinction while incorporating several sources of uncertainty. Building upon this framework, we present a suite of methodological advances including more complex and realistic models of preservation and the first likelihood-based test to compare the fit across different models. Further, we develop a new reversible jump Markov chain Monte Carlo algorithm to estimate origination and extinction rates and their temporal variation, which provides more reliable results and includes an explicit estimation of the number and temporal placement of statistically significant rate changes. Finally, we implement a new C++ library that speeds up the analyses by orders of magnitude, therefore facilitating the application of the PyRate methods to large data sets. We demonstrate the new functionalities through extensive simulations and with the analysis of a large data set of Cenozoic marine mammals. We compare our analytical framework against two widely used alternative methods to infer origination and extinction rates, revealing that PyRate decisively outperforms them across a range of simulated data sets. Our analyses indicate that explicit statistical model testing, which is often neglected in fossil-based macroevolutionary analyses, is crucial to obtain accurate and robust results.

Download Full-text