scholarly journals Mistreating birth-death models as priors in phylogenetic analysis compromises our ability to compare models

2021 ◽  
Author(s):  
Michael R May ◽  
Carl Rothfels

Time-calibrated phylogenetic trees are fundamental to a wide range of evolutionary studies. Typically, these trees are inferred in a Bayesian framework, with the phylogeny itself treated as a parameter with a prior distribution (a "tree prior"). This prior distribution is often a variant of the stochastic birth-death process, which models speciation events, extinction events, and sampling events (of extinct and/or extant lineages). However, the samples produced by this process are observations, so their probability should be viewed as a likelihood rather than a prior probability. We show that treating the samples as part of the prior results in incorrect marginal likelihood estimates and can result in model-comparison approaches disfavoring the best model within a set of candidate models. The ability to correctly compare the fit of competing tree models is critical to accurate phylogenetic estimates, especially of divergence times, and also to studying the processes that govern lineage diversification. We outline potential remedies, and provide guidance for researchers interested in comparing the fit of competing tree models.

2019 ◽  
Vol 69 (2) ◽  
pp. 325-344 ◽  
Author(s):  
Arong Luo ◽  
David A Duchêne ◽  
Chi Zhang ◽  
Chao-Dong Zhu ◽  
Simon Y W Ho

Abstract Bayesian molecular dating is widely used to study evolutionary timescales. This procedure usually involves phylogenetic analysis of nucleotide sequence data, with fossil-based calibrations applied as age constraints on internal nodes of the tree. An alternative approach is tip-dating, which explicitly includes fossil data in the analysis. This can be done, for example, through the joint analysis of molecular data from present-day taxa and morphological data from both extant and fossil taxa. In the context of tip-dating, an important development has been the fossilized birth–death process, which allows non-contemporaneous tips and sampled ancestors while providing a model of lineage diversification for the prior on the tree topology and internal node times. However, tip-dating with fossils faces a number of considerable challenges, especially, those associated with fossil sampling and evolutionary models for morphological characters. We conducted a simulation study to evaluate the performance of tip-dating using the fossilized birth–death model. We simulated fossil occurrences and the evolution of nucleotide sequences and morphological characters under a wide range of conditions. Our analyses of these data show that the number and the maximum age of fossil occurrences have a greater influence than the degree of among-lineage rate variation or the number of morphological characters on estimates of node times and the tree topology. Tip-dating with the fossilized birth–death model generally performs well in recovering the relationships among extant taxa but has difficulties in correctly placing fossil taxa in the tree and identifying the number of sampled ancestors. The method yields accurate estimates of the ages of the root and crown group, although the precision of these estimates varies with the probability of fossil occurrence. The exclusion of morphological characters results in a slight overestimation of node times, whereas the exclusion of nucleotide sequences has a negative impact on inference of the tree topology. Our results provide an overview of the performance of tip-dating using the fossilized birth–death model, which will inform further development of the method and its application to key questions in evolutionary biology.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mariusz A. Salamon ◽  
Tomasz Brachaniec ◽  
Dorota Kołbuk ◽  
Anwesha Saha ◽  
Przemysław Gorzelak

AbstractCrinoids were among the most abundant marine benthic animals throughout the Palaeozoic, but their body size evolution has received little attention. Here, we compiled a comprehensive database on crinoid calyx biovolumes throughout the Palaeozoic. A model comparison approach revealed contrasting and complex patterns in body size dynamics between the two major crinoid clades (Camerata and Pentacrinoidea). Interestingly, two major drops in mean body size at around two mass extinction events (during the late Ordovician and the late Devonian respectively) are observed, which is reminiscent of current patterns of shrinking body size of a wide range of organisms as a result of climate change. The context of some trends (marked declines during extinctions) suggests the cardinal role of abiotic factors (dramatic climate change associated with extinctions) on crinoid body size evolution; however, other patterns (two intervals with either relative stability or steady size increase in periods between mass extinctions) are more consistent with biotic drivers.


2018 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G. Vaughan ◽  
Tanja Stadler

AbstractHeterogeneous populations can lead to important differences in birth and death rates across a phylogeny Taking this heterogeneity into account is thus critical to obtain accurate estimates of the underlying population dynamics. We present a new multi-state birth-death model (MSBD) that can estimate lineage-specific birth and death rates. For species phylogenies, this corresponds to estimating lineage-dependent speciation and extinction rates. Contrary to existing models, we do not require a prior hypothesis on a trait driving the rate differences and we allow the same rates to be present in different parts of the phylogeny. Using simulated datasets, we show that the MSBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a re-analysis of two empirical datasets and compare the results obtained by MSBD and by the existing software BAMM. The MSBD model is implemented as a package in the Bayesian inference software BEAST2, which allows joint inference of the phylogeny and the model parameters.Significance statementPhylogenetic trees can inform about the underlying speciation and extinction processes within a species clade. Many different factors, for instance environmental changes or morphological changes, can lead to differences in macroevolutionary dynamics within a clade. We present here a new multi-state birth-death (MSBD) model that can detect these differences and estimate both the position of changes in the tree and the associated macroevolutionary parameters. The MSBD model does not require a prior hypothesis on which trait is driving the changes in dynamics and is thus applicable to a wide range of datasets. It is implemented as an extension to the existing framework BEAST2.


2018 ◽  
Author(s):  
Arong Luo ◽  
David A. Duchêne ◽  
Chi Zhang ◽  
Chao-Dong Zhu ◽  
Simon Y.W. Ho

AbstractBayesian molecular dating is widely used to study evolutionary timescales. This procedure usually involves phylogenetic analysis of nucleotide sequence data, with fossil-based calibrations applied as age constraints on internal nodes of the tree. An alternative approach is Bayesian total-evidence dating, which involves the joint analysis of molecular data from present-day taxa and morphological data from both extant and fossil taxa. Part of its appeal stems from the fossilized birth-death process, which provides a model of lineage diversification for the prior on the tree topology and node times. However, total-evidence dating faces a number of considerable challenges, especially those associated with fossil sampling and evolutionary models for morphological characters. We conducted a simulation study to evaluate the performance of total-evidence dating with the fossilized birth-death model. We simulated fossil occurrences and the evolution of nucleotide sequences and morphological characters under a wide range of conditions. Our analyses show that fossil occurrences have a greater influence than the degree of among-lineage rate variation or the number of morphological characters on estimates of node times and the tree topology. Total-evidence dating generally performs well in recovering the relationships among extant taxa, but has difficulties in correctly placing fossil taxa in the tree and identifying the number of sampled ancestors. The method yields accurate estimates of the origin time of the fossilized birth-death process and the ages of the root and crown group, although the precision of these estimates varies with the probability of fossil occurrence. The exclusion of morphological characters results in a slight overestimation of node times, whereas the exclusion of nucleotide sequences has a negative impact on inference of the tree topology. Overall, our results provide a detailed view of the performance of total-evidence dating, which will inform further development of the method and its application to key questions in evolutionary biology.


2019 ◽  
Author(s):  
Sebastian Höhna ◽  
William A. Freyman ◽  
Zachary Nolen ◽  
John P. Huelsenbeck ◽  
Michael R. May ◽  
...  

AbstractSpecies richness varies considerably among the tree of life which can only be explained by heterogeneous rates of diversification (speciation and extinction). Previous approaches use phylogenetic trees to estimate branch-specific diversification rates. However, all previous approaches disregard diversification-rate shifts on extinct lineages although 99% of species that ever existed are now extinct. Here we describe a lineage-specific birth-death-shift process where lineages, both extant and extinct, may have heterogeneous rates of diversification. To facilitate probability computation we discretize the base distribution on speciation and extinction rates into k rate categories. The fixed number of rate categories allows us to extend the theory of state-dependent speciation and extinction models (e.g., BiSSE and MuSSE) to compute the probability of an observed phylogeny given the set of speciation and extinction rates. To estimate branch-specific diversification rates, we develop two independent and theoretically equivalent approaches: numerical integration with stochastic character mapping and data-augmentation with reversible-jump Markov chain Monte Carlo sampling. We validate the implementation of the two approaches in RevBayes using simulated data and an empirical example study of primates. In the empirical example, we show that estimates of the number of diversification-rate shifts are, unsurprisingly, very sensitive to the choice of prior distribution. Instead, branch-specific diversification rate estimates are less sensitive to the assumed prior distribution on the number of diversification-rate shifts and consistently infer an increased rate of diversification for Old World Monkeys. Additionally, we observe that as few as 10 diversification-rate categories are sufficient to approximate a continuous base distribution on diversification rates. In conclusion, our implementation of the lineage-specific birth-death-shift model in RevBayes provides biologists with a method to estimate branch-specific diversification rates under a mathematically consistent model.


1986 ◽  
Vol 23 (04) ◽  
pp. 1013-1018
Author(s):  
B. G. Quinn ◽  
H. L. MacGillivray

Sufficient conditions are presented for the limiting normality of sequences of discrete random variables possessing unimodal distributions. The conditions are applied to obtain normal approximations directly for the hypergeometric distribution and the stationary distribution of a special birth-death process.


Author(s):  
Majid Asadi ◽  
Antonio Di Crescenzo ◽  
Farkhondeh A. Sajadi ◽  
Serena Spina

AbstractIn this paper, we propose a flexible growth model that constitutes a suitable generalization of the well-known Gompertz model. We perform an analysis of various features of interest, including a sensitivity analysis of the initial value and the three parameters of the model. We show that the considered model provides a good fit to some real datasets concerning the growth of the number of individuals infected during the COVID-19 outbreak, and software failure data. The goodness of fit is established on the ground of the ISRP metric and the $$d_2$$ d 2 -distance. We also analyze two time-inhomogeneous stochastic processes, namely a birth-death process and a birth process, whose means are equal to the proposed growth curve. In the first case we obtain the probability of ultimate extinction, being 0 an absorbing endpoint. We also deal with a threshold crossing problem both for the proposed growth curve and the corresponding birth process. A simulation procedure for the latter process is also exploited.


Author(s):  
Hui Wang ◽  
Hanbo Zhao ◽  
Yujia Chu ◽  
Jiang Feng ◽  
Keping Sun

Abstract High-frequency hearing is particularly important for echolocating bats and toothed whales. Previously, studies of the hearing-related genes Prestin, KCNQ4, and TMC1 documented that adaptive evolution of high-frequency hearing has taken place in echolocating bats and toothed whales. In this study, we present two additional candidate hearing-related genes, Shh and SK2, that may also have contributed to the evolution of echolocation in mammals. Shh is a member of the vertebrate Hedgehog gene family and is required in the specification of the mammalian cochlea. SK2 is expressed in both inner and outer hair cells, and it plays an important role in the auditory system. The coding region sequences of Shh and SK2 were obtained from a wide range of mammals with and without echolocating ability. The topologies of phylogenetic trees constructed using Shh and SK2 were different; however, multiple molecular evolutionary analyses showed that those two genes experienced different selective pressures in echolocating bats and toothed whales compared to non-echolocating mammals. In addition, several nominally significant positively selected sites were detected in the non-functional domain of the SK2 gene, indicating that different selective pressures were acting on different parts of the SK2 gene. This study has expanded our knowledge of the adaptive evolution of high-frequency hearing in echolocating mammals.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1855-1861 ◽  
Author(s):  
Montgomery Slatkin ◽  
Bruce Rannala

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.


Sign in / Sign up

Export Citation Format

Share Document