Estimating Diversification Rates on Incompletely Sampled Phylogenies: Theoretical Concerns and Practical Solutions

2019 ◽  
Vol 69 (3) ◽  
pp. 602-611 ◽  
Author(s):  
Jonathan Chang ◽  
Daniel L Rabosky ◽  
Michael E Alfaro

Abstract Molecular phylogenies are a key source of information about the tempo and mode of species diversification. However, most empirical phylogenies do not contain representatives of all species, such that diversification rates are typically estimated from incompletely sampled data. Most researchers recognize that incomplete sampling can lead to biased rate estimates, but the statistical properties of methods for accommodating incomplete sampling remain poorly known. In this point of view, we demonstrate theoretical concerns with the widespread use of analytical sampling corrections for sparsely sampled phylogenies of higher taxonomic groups. In particular, corrections based on “sampling fractions” can lead to low statistical power to infer rate variation when it is present, depending on the likelihood function used for inference. In the extreme, the sampling fraction correction can lead to spurious patterns of diversification that are driven solely by unbalanced sampling across the tree in concert with low overall power to infer shifts. Stochastic polytomy resolution provides an alternative to sampling fraction approaches that avoids some of these biases. We show that stochastic polytomy resolvers can greatly improve the power of common analyses to estimate shifts in diversification rates. We introduce a new stochastic polytomy resolution method (Taxonomic Addition for Complete Trees [TACT]) that uses birth–death-sampling estimators across an ultrametric phylogeny to estimate branching times for unsampled taxa, with taxonomic information to compatibly place new taxa onto a backbone phylogeny. We close with practical recommendations for diversification inference under several common scenarios of incomplete sampling. [Birth–death process; diversification; incomplete sampling; phylogenetic uncertainty; rate heterogeneity; rate shifts; stochastic polytomy resolution.]

2019 ◽  
Author(s):  
Andrew F. Magee ◽  
Sebastian Höhna ◽  
Tetyana I. Vasylyeva ◽  
Adam D. Leaché ◽  
Vladimir N. Minin

AbstractBirth-death processes have given biologists a model-based framework to answer questions about changes in the birth and death rates of lineages in a phylogenetic tree. Therefore birth-death models are central to macroevolutionary as well as phylodynamic analyses. Early approaches to studying temporal variation in birth and death rates using birth-death models faced difficulties due to the restrictive choices of birth and death rate curves through time. Sufficiently flexible time-varying birth-death models are still lacking. We use a piecewise-constant birth-death model, combined with both Gaussian Markov random field (GMRF) and horseshoe Markov random field (HSMRF) prior distributions, to approximate arbitrary changes in birth rate through time. We implement these models in the widely used statistical phylogenetic software platform RevBayes, allowing us to jointly estimate birth-death process parameters, phylogeny, and nuisance parameters in a Bayesian framework. We test both GMRF-based and HSMRF-based models on a variety of simulated diversification scenarios, and then apply them to both a macroevolutionary and an epidemiological dataset. We find that both models are capable of inferring variable birth rates and correctly rejecting variable models in favor of effectively constant models. In general the HSMRF-based model has higher precision than its GMRF counterpart, with little to no loss of accuracy. Applied to a macroevolutionary dataset of the Australian gecko family Pygopodidae (where birth rates are interpretable as speciation rates), the GMRF-based model detects a slow decrease whereas the HSMRF-based model detects a rapid speciation-rate decrease in the last 12 million years. Applied to an infectious disease phylodynamic dataset of sequences from HIV subtype A in Russia and Ukraine (where birth rates are interpretable as the rate of accumulation of new infections), our models detect a strongly elevated rate of infection in the 1990s.Author summaryBoth the growth of groups of species and the spread of infectious diseases through populations can be modeled as birth-death processes. Birth events correspond either to speciation or infection, and death events to extinction or becoming noninfectious. The rates of birth and death may vary over time, and by examining this variation researchers can pinpoint important events in the history of life on Earth or in the course of an outbreak. Time-calibrated phylogenies track the relationships between a set of species (or infections) and the times of all speciation (or infection) events, and can thus be used to infer birth and death rates. We develop two phylogenetic birth-death models with the goal of discerning signal of rate variation from noise due to the stochastic nature of birth-death models. Using a variety of simulated datasets, we show that one of these models can accurately infer slow and rapid rate shifts without sacrificing precision. Using real data, we demonstrate that our new methodology can be used for simultaneous inference of phylogeny and rates through time.


1986 ◽  
Vol 23 (04) ◽  
pp. 1013-1018
Author(s):  
B. G. Quinn ◽  
H. L. MacGillivray

Sufficient conditions are presented for the limiting normality of sequences of discrete random variables possessing unimodal distributions. The conditions are applied to obtain normal approximations directly for the hypergeometric distribution and the stationary distribution of a special birth-death process.


Author(s):  
Majid Asadi ◽  
Antonio Di Crescenzo ◽  
Farkhondeh A. Sajadi ◽  
Serena Spina

AbstractIn this paper, we propose a flexible growth model that constitutes a suitable generalization of the well-known Gompertz model. We perform an analysis of various features of interest, including a sensitivity analysis of the initial value and the three parameters of the model. We show that the considered model provides a good fit to some real datasets concerning the growth of the number of individuals infected during the COVID-19 outbreak, and software failure data. The goodness of fit is established on the ground of the ISRP metric and the $$d_2$$ d 2 -distance. We also analyze two time-inhomogeneous stochastic processes, namely a birth-death process and a birth process, whose means are equal to the proposed growth curve. In the first case we obtain the probability of ultimate extinction, being 0 an absorbing endpoint. We also deal with a threshold crossing problem both for the proposed growth curve and the corresponding birth process. A simulation procedure for the latter process is also exploited.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1855-1861 ◽  
Author(s):  
Montgomery Slatkin ◽  
Bruce Rannala

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.


Author(s):  
Michel Mandjes ◽  
Birgit Sollie

AbstractThis paper considers a continuous-time quasi birth-death (qbd) process, which informally can be seen as a birth-death process of which the parameters are modulated by an external continuous-time Markov chain. The aim is to numerically approximate the time-dependent distribution of the resulting bivariate Markov process in an accurate and efficient way. An approach based on the Erlangization principle is proposed and formally justified. Its performance is investigated and compared with two existing approaches: one based on numerical evaluation of the matrix exponential underlying the qbd process, and one based on the uniformization technique. It is shown that in many settings the approach based on Erlangization is faster than the other approaches, while still being highly accurate. In the last part of the paper, we demonstrate the use of the developed technique in the context of the evaluation of the likelihood pertaining to a time series, which can then be optimized over its parameters to obtain the maximum likelihood estimator. More specifically, through a series of examples with simulated and real-life data, we show how it can be deployed in model selection problems that involve the choice between a qbd and its non-modulated counterpart.


Author(s):  
Ajay Jasra ◽  
Maria De Iorio ◽  
Marc Chadeau-Hyam

In this paper, we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated with such models. This typically consists of using importance sampling and sequential Monte Carlo techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor. However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper, we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.


Author(s):  
Phil Diamond

AbstractCompetition between a finite number of searching insect parasites is modelled by differential equations and birth-death processes. In the one species case of intraspecific competition, the deterministic equilibrium is globally stable and, for large populations, approximates the mean of the stationary distribution of the process. For two species, both inter- and intraspecific competition occurs and the deterministic equilibrium is globally stable. When the birth-death process is reversible, it is shown that the mean of the stationary distribution is approximated by the equilibrium. Confluent hypergeometric functions of two variables are important to the theory.


Sign in / Sign up

Export Citation Format

Share Document