scholarly journals Species delimitation and phylogeny estimation under the multispecies coalescent

2014 ◽  
Author(s):  
Graham R Jones

This article describes a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using DNA sequences from multiple loci. The focus here is on species delimitation with no a priori assignment of individuals to species, and no guide tree. The method uses a new model for the population sizes along the branches of the species tree, and three new operators for sampling from the posterior using the Markov chain Monte Carlo (MCMC) algorithm. The correctness of the moves is demonstrated both by proofs and by tests of the implementation. Current practice, using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data (Olave et al, 2014). The same simulated data set is used to demonstrate the accuracy and efficiency of the present method. The method is implemented in a package called STACEY for BEAST2.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3724 ◽  
Author(s):  
Ana C. Afonso Silva ◽  
Natali Santos ◽  
Huw A. Ogilvie ◽  
Craig Moritz

While methods for genetic species delimitation have noticeably improved in the last decade, this remains a work in progress. Ideally, model based approaches should be applied and considered jointly with other lines of evidence, primarily morphology and geography, in an integrative taxonomy framework. Deep phylogeographic divergences have been reported for several species ofCarliaskinks, but only for some eastern taxa have species boundaries been formally tested. The present study does this and revises the taxonomy for two species from northern Australia,Carlia johnstoneiandC. triacantha. We introduce an approach that is based on the recently published method StarBEAST2, which uses multilocus data to explore the support for alternative species delimitation hypotheses using Bayes Factors (BFD). We apply this method, jointly with two other multispecies coalescent methods, using an extensive (from 2,163 exons) data set along with measures of 11 morphological characters. We use this integrated approach to evaluate two new candidate species previously revealed in phylogeographic analyses of rainbow skinks (genusCarlia) in Western Australia. The results based on BFD StarBEAST2, BFD* SNAPP and BPP genetic delimitation, together with morphology, support each of the four recently identifiedCarlialineages as separate species. The BFD StarBEAST2 approach yielded results highly congruent with those from BFD* SNAPP and BPP. This supports use of the robust multilocus multispecies coalescent StarBEAST2 method for species delimitation, which does not requirea prioriresolved species or gene trees. Compared to the situation inC. triacantha, morphological divergence was greater between the two lineages within Kimberley endemicC. johnstonei, which also had deeper divergent histories. This congruence supports recognition of two species withinC. johnstonei. Nevertheless, the combined evidence also supports recognition of two taxa within the more widespreadC. triacantha. With this work, we describe two new species,Carlia insularissp. nov andCarlia isostriacanthasp. nov. in the northwest of Australia. This contributes to increasing recognition that this region of tropical Australia has a rich and unique fauna.



Genetics ◽  
2003 ◽  
Vol 164 (4) ◽  
pp. 1645-1656 ◽  
Author(s):  
Bruce Rannala ◽  
Ziheng Yang

Abstract The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.



2021 ◽  
Vol 4 (1) ◽  
pp. 251524592095492
Author(s):  
Marco Del Giudice ◽  
Steven W. Gangestad

Decisions made by researchers while analyzing data (e.g., how to measure variables, how to handle outliers) are sometimes arbitrary, without an objective justification for choosing one alternative over another. Multiverse-style methods (e.g., specification curve, vibration of effects) estimate an effect across an entire set of possible specifications to expose the impact of hidden degrees of freedom and/or obtain robust, less biased estimates of the effect of interest. However, if specifications are not truly arbitrary, multiverse-style analyses can produce misleading results, potentially hiding meaningful effects within a mass of poorly justified alternatives. So far, a key question has received scant attention: How does one decide whether alternatives are arbitrary? We offer a framework and conceptual tools for doing so. We discuss three kinds of a priori nonequivalence among alternatives—measurement nonequivalence, effect nonequivalence, and power/precision nonequivalence. The criteria we review lead to three decision scenarios: Type E decisions (principled equivalence), Type N decisions (principled nonequivalence), and Type U decisions (uncertainty). In uncertain scenarios, multiverse-style analysis should be conducted in a deliberately exploratory fashion. The framework is discussed with reference to published examples and illustrated with the help of a simulated data set. Our framework will help researchers reap the benefits of multiverse-style methods while avoiding their pitfalls.



2021 ◽  
pp. gr.273631.120
Author(s):  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.



Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 802
Author(s):  
Chun-xiao Sun ◽  
Yu Yang ◽  
Hua Wang ◽  
Wen-hu Wang

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.



2020 ◽  
Vol 18 (1) ◽  
pp. 2-23
Author(s):  
Ross M. Gosky ◽  
Joel Sanqui

Capture-Recapture models are useful in estimating unknown population sizes. A common modeling challenge for closed population models involves modeling unequal animal catchability in each capture period, referred to as animal heterogeneity. Inference about population size N is dependent on the assumed distribution of animal capture probabilities in the population, and that different models can fit a data set equally well but provide contradictory inferences about N. Three common Bayesian Capture-Recapture heterogeneity models are studied with simulated data to study the prevalence of contradictory inferences is in different population sizes with relatively low capture probabilities, specifically at different numbers of capture periods in the study.



2017 ◽  
Author(s):  
Graham Jones

AbstractThis paper focuses on the problem of estimating a species tree from multilocus data in the presence of incomplete lineage sorting and migration. We develop a mathematical model similar to IMa2 (Hey 2010) for the relevant evolutionary processes which allows both the the population size parameters and the migration rates between pairs of species tree branches to be integrated out. We then describe a BEAST2 package DENIM which based on this model, and which uses an approximation to sample from the posterior. The approximation is based on the assumption that migrations are rare, and it only samples from certain regions of the posterior which seem likely given this assumption. The method breaks down if there is a lot of migration. Using simulations, Leaché et al 2014 showed migration causes problems for species tree inference using the multispecies coalescent when migration is present but ignored. We re-analyze this simulated data to explore DENIM’s performance, and demonstrate substantial improvements over *BEAST. We also re-analyze an empirical data set. [isolation-with-migration; incomplete lineage sorting; multispecies coalescent; species tree; phylogenetic analysis; Bayesian; Markov chain Monte Carlo]



2012 ◽  
Vol 279 (1743) ◽  
pp. 3678-3686 ◽  
Author(s):  
Jacob A. Esselstyn ◽  
Ben J. Evans ◽  
Jodi L. Sedlock ◽  
Faisal Ali Anwarali Khan ◽  
Lawrence R. Heaney

Prospects for a comprehensive inventory of global biodiversity would be greatly improved by automating methods of species delimitation. The general mixed Yule–coalescent (GMYC) was recently proposed as a potential means of increasing the rate of biodiversity exploration. We tested this method with simulated data and applied it to a group of poorly known bats ( Hipposideros ) from the Philippines. We then used echolocation call characteristics to evaluate the plausibility of species boundaries suggested by GMYC. In our simulations, GMYC performed relatively well (errors in estimated species diversity less than 25%) when the product of the haploid effective population size ( N e ) and speciation rate (SR; per lineage per million years) was less than or equal to 10 5 , while interspecific variation in N e was twofold or less. However, at higher but also biologically relevant values of N e × SR and when N e varied tenfold among species, performance was very poor. GMYC analyses of mitochondrial DNA sequences from Philippine Hipposideros suggest actual diversity may be approximately twice the current estimate, and available echolocation call data are mostly consistent with GMYC delimitations. In conclusion, we consider the GMYC model useful under some conditions, but additional information on N e , SR and/or corroboration from independent character data are needed to allow meaningful interpretation of results.



2014 ◽  
Author(s):  
Graham Jones ◽  
Bengt Oxelman

Motivation: The multispecies coalescent model provides a formal framework for the assignment of individual organisms to species, where the species are modeled as the branches of the species tree. None of the available approaches so far have simultaneously co-estimated all the relevant parameters in the model, without restricting the parameter space by requiring a guide tree and/or prior assignment of individuals to clusters or species. Results: We present DISSECT, which explores the full space of possible clusterings of individuals and species tree topologies in a Bayesian framework. It uses an approximation to avoid the need for reversible-jump MCMC, in the form of a prior that is a modification of the birth-death prior for the species tree. It incorporates a spike near zero in the density for node heights. The model has two extra parameters: one controls the degree of approximation, and the second controls the prior distribution on the numbers of species. It is implemented as part of BEAST and requires only a few changes from a standard *BEAST analysis. The method is evaluated on simulated data and demonstrated on an empirical data set. The method is shown to be insensitive to the degree of approximation, but quite sensitive to the second parameter, suggesting that large numbers of sequences are needed to draw firm conclusions. Availability:http://code.google.com/p/beast-mcmc/, http://www.indriid.com/dissectinbeast.html Contact:[email protected], www.indriid.com Supplementary information: Supplementary material is available.



2018 ◽  
Author(s):  
Huw A. Ogilvie ◽  
Timothy G. Vaughan ◽  
Nicholas J. Matzke ◽  
Graham J. Slater ◽  
Tanja Stadler ◽  
...  

AbstractBayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.



Sign in / Sign up

Export Citation Format

Share Document