scholarly journals Decision letter: A multispecies coalescent model for quantitative traits

2018 ◽  
Author(s):  
Matthew Pennell
2018 ◽  
Author(s):  
Fábio K Mendes ◽  
Jesualdo A Fuentes-González ◽  
Joshua G Schraiber ◽  
Matthew W Hahn

eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Fábio K Mendes ◽  
Jesualdo A Fuentes-González ◽  
Joshua G Schraiber ◽  
Matthew W Hahn

We present a multispecies coalescent model for quantitative traits that allows for evolutionary inferences at micro- and macroevolutionary scales. A major advantage of this model is its ability to incorporate genealogical discordance underlying a quantitative trait. We show that discordance causes a decrease in the expected trait covariance between more closely related species relative to more distantly related species. If unaccounted for, this outcome can lead to an overestimation of a trait’s evolutionary rate, to a decrease in its phylogenetic signal, and to errors when examining shifts in mean trait values. The number of loci controlling a quantitative trait appears to be irrelevant to all trends reported, and discordance also affected discrete, threshold traits. Our model and analyses point to the conditions under which different methods should fare better or worse, in addition to indicating current and future approaches that can mitigate the effects of discordance.


2016 ◽  
Vol 94 ◽  
pp. 447-462 ◽  
Author(s):  
Scott V. Edwards ◽  
Zhenxiang Xi ◽  
Axel Janke ◽  
Brant C. Faircloth ◽  
John E. McCormack ◽  
...  

Author(s):  
John A Rhodes ◽  
Hector Baños ◽  
Jonathan D Mitchell ◽  
Elizabeth S Allman

Abstract Summary MSCquartets is an R package for species tree hypothesis testing, inference of species trees, and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. Availability MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets. Supplementary information Supplementary materials, including example data and analyses, are incorporated into the package.


2015 ◽  
Vol 61 (5) ◽  
pp. 854-865 ◽  
Author(s):  
Ziheng Yang

Abstract This paper provides an overview and a tutorial of the BPP program, which is a Bayesian MCMC program for analyzing multi-locus genomic sequence data under the multispecies coalescent model. An example dataset of five nuclear loci from the East Asian brown frogs is used to illustrate four different analyses, including estimation of species divergence times and population size parameters under the multispecies coalescent model on a fixed species phylogeny (A00), species tree estimation when the assignment and species delimitation are fixed (A01), species delimitation using a fixed guide tree (A10), and joint species delimitation and species-tree estimation or unguided species delimitation (A11). For the joint analysis (A11), two new priors are introduced, which assign uniform probabilities for the different numbers of delimited species, which may be useful when assignment, species delimitation, and species phylogeny are all inferred in one joint analysis. The paper ends with a discussion of the assumptions, the strengths and weaknesses of the BPP analysis.


2019 ◽  
Vol 37 (4) ◽  
pp. 1211-1223 ◽  
Author(s):  
Tomáš Flouri ◽  
Xiyun Jiao ◽  
Bruce Rannala ◽  
Ziheng Yang

Abstract Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.


2020 ◽  
Vol 69 (4) ◽  
pp. 795-812 ◽  
Author(s):  
Xiaodong Jiang ◽  
Scott V Edwards ◽  
Liang Liu

Abstract A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically congruent gene trees suggest that a poor fit of substitution models, rejected by 44% of loci, and concatenation models, rejected by 38% of loci, is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across six major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models. Although conducted on reduced data sets due to computational constraints, Bayesian model validation and comparison both strongly favor the MSC over concatenation across all data sets; the concatenation assumption of congruent gene trees rarely holds for phylogenomic data sets with more than 10 loci. Thus, for large phylogenomic data sets, model comparisons are expected to consistently and more strongly favor the coalescent model over the concatenation model. We also found that loci rejecting the MSC have little effect on species tree estimation. Our study reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference. [Bayes factor; Bayesian model validation; coalescent prior; congruent gene trees; independent prior; Metazoa; posterior predictive simulation.]


2016 ◽  
Author(s):  
Ziheng Yang ◽  
Bruce Rannala

A number of methods have been developed to use genetic sequence data to identify and delineate species. Some methods are based on heuristics, such as DNA barcoding which is based on a sequence-distance threshold, while others use Bayesian model comparison under the multispecies coalescent model. Here we use mathematical analysis and computer simulation to demonstrate large differences in statistical performance of species identification between DNA barcoding and Bayesian inference under the multispecies coalescent model as implemented in the bpp program. We show that a fixed genetic-distance threshold as used in DNA barcoding is problematic for delimiting species, even if the threshold is "optimized", because different species have different population sizes and different divergence times, and therefore display different amounts of intra-species versus inter-species variation. In contrast, bpp can reliably delimit species in such situations with only one locus and rarely supports a wrong assignment with high posterior probability. While under-sampling or rare specimens may pose problems for heuristic methods, bpp can delimit species with high power when multi-locus data are used, even if the species is represented by a single specimen. Finally we demonstrate that bpp may be powerful for delimiting cryptic species using specimens that are misidentified as a single species in the barcoding library.


Sign in / Sign up

Export Citation Format

Share Document