scholarly journals The impact of ancestral population size and incomplete lineage sorting on Bayesian estimation of species divergence times

2015 ◽  
Vol 61 (5) ◽  
pp. 874-885 ◽  
Author(s):  
Konstantinos Angelis ◽  
Mario Dos Reis

Abstract Although the effects of the coalescent process on sequence divergence and genealogies are well understood, the virtual majority of studies that use molecular sequences to estimate times of divergence among species have failed to account for the coalescent process. Here we study the impact of ancestral population size and incomplete lineage sorting on Bayesian estimates of species divergence times under the molecular clock when the inference model ignores the coalescent process. Using a combination of mathematical analysis, computer simulations and analysis of real data, we find that the errors on estimates of times and the molecular rate can be substantial when ancestral populations are large and when there is substantial incomplete lineage sorting. For example, in a simple three-species case, we find that if the most precise fossil calibration is placed on the root of the phylogeny, the age of the internal node is overestimated, while if the most precise calibration is placed on the internal node, then the age of the root is underestimated. In both cases, the molecular rate is overestimated. Using simulations on a phylogeny of nine species, we show that substantial errors in time and rate estimates can be obtained even when dating ancient divergence events. We analyse the hominoid phylogeny and show that estimates of the neutral mutation rate obtained while ignoring the coalescent are too high. Using a coalescent-based technique to obtain geological times of divergence, we obtain estimates of the mutation rate that are within experimental estimates and we also obtain substantially older divergence times within the phylogeny.

Genetics ◽  
2003 ◽  
Vol 164 (4) ◽  
pp. 1645-1656 ◽  
Author(s):  
Bruce Rannala ◽  
Ziheng Yang

Abstract The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.


2022 ◽  
Author(s):  
XiaoXu Pang ◽  
Da-Yong Zhang

The species studied in any evolutionary investigation generally constitute a very small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has been rarely studied and is thus poorly understood. In this study, we use mathematical analysis and simulations to examine the robustness of species tree methods based on a multispecies coalescent model under gene flow sourcing from an extant or ghost lineage. We found that very low levels of extant or ghost introgression can result in anomalous gene trees (AGTs) on three-taxon rooted trees if accompanied by strong incomplete lineage sorting (ILS). In contrast, even massive introgression, with more than half of the recipient genome descending from the donor lineage, may not necessarily lead to AGTs. In cases involving an ingroup lineage (defined as one that diverged no earlier than the most basal species under investigation) acting as the donor of introgression, the time of root divergence among the investigated species was either underestimated or remained unaffected, but for the cases of outgroup ghost lineages acting as donors, the divergence time was generally overestimated. Under many conditions of ingroup introgression, the stronger the ILS was, the higher was the accuracy of estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.


2004 ◽  
Vol 59 (4) ◽  
pp. 478-487 ◽  
Author(s):  
Yoko Satta ◽  
Michael Hickerson ◽  
Hidemi Watanabe ◽  
Colm O’hUigin ◽  
Jan Klein

2020 ◽  
Author(s):  
Michael J. Sanderson ◽  
Michelle M. McMahon ◽  
Mike Steel

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.


2016 ◽  
Author(s):  
Dingqiao Wen ◽  
Luay Nakhleh

AbstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set. [Multispecies network coalescent; reticulation; incomplete lineage sorting; phylogenetic network; Bayesian inference; RJMCMC.]


2020 ◽  
Vol 111 (6) ◽  
pp. 573-582
Author(s):  
Zachary B Hancock ◽  
Heath Blackmon

Abstract Isolation-by-distance is a widespread pattern in nature that describes the reduction of genetic correlation between subpopulations with increased geographic distance. In the population ancestral to modern sister species, this pattern may hypothetically inflate population divergence time estimation due to allele frequency differences in subpopulations at the ends of the ancestral population. In this study, we analyze the relationship between the time to the most recent common ancestor and the population divergence time when the ancestral population model is a linear stepping-stone. Using coalescent simulations, we compare the coalescent time to the population divergence time for various ratios of the divergence time over the population size. Next, we simulate whole genomes to obtain single nucleotide polymorphisms (SNPs), and use the Bayesian coalescent program SNAPP to estimate divergence times. We find that as the rate of migration between neighboring demes decreases, the coalescent time becomes significantly greater than the population divergence time when sampled from end demes. Divergence-time overestimation in SNAPP becomes severe when the divergence-to-population size ratio < 10 and migration is low. Finally, we demonstrate the impact of ancestral isolation-by-distance on divergence-time estimation using an empirical dataset of squamates (Tropidurus) endemic to Brazil. We conclude that studies estimating divergence times should be cognizant of the potential ancestral population structure in an explicitly spatial context or risk dramatically overestimating the timing of population splits.


Genetics ◽  
2003 ◽  
Vol 163 (1) ◽  
pp. 395-404 ◽  
Author(s):  
Jeffrey D Wall

Abstract This article presents a new method for jointly estimating species divergence times and ancestral population sizes. The method improves on previous ones by explicitly incorporating intragenic recombination, by utilizing orthologous sequence data from closely related species, and by using a maximum-likelihood framework. The latter allows for efficient use of the available information and provides a way of assessing how much confidence we should place in the estimates. I apply the method to recently collected intergenic sequence data from humans and the great apes. The results suggest that the human-chimpanzee ancestral population size was four to seven times larger than the current human effective population size and that the current human effective population size is slightly >10,000. These estimates are similar to previous ones, and they appear relatively insensitive to assumptions about the recombination rates or mutation rates across loci.


2017 ◽  
Author(s):  
Erin K. Molloy ◽  
Tandy Warnow

AbstractSpecies tree estimation from loci sampled from multiple genomes is now common, but is challenged by the heterogeneity across the genome due to multiple processes, such as gene duplication and loss, horizontal gene transfer, and incomplete lineage sorting. Although methods for estimating species trees have been developed that address gene tree heterogeneity due to incomplete lineage sorting, many of these methods operate by combining estimated gene trees and are hence vulnerable to gene tree quality. There is also the added concern that missing data, which is frequently encountered in genome-scale datasets, will impact species tree estimation.Our study addresses the impact of gene filtering on species trees inferred from multi-gene datasets. We address these questions using a large and heterogeneous collection of simulated datasets both with and without missing data. We compare several established coalescent-based methods (ASTRAL, ASTRID, MP-EST, and SVDquartets within PAUP*) as well as unpartitioned concatenation using maximum likelihood (RAxML).Our study shows that gene tree error and missing data impact all methods (and some methods degrade more than others), but the degree of incomplete lineage sorting and gene tree estimation error impacts the absolute and relative performance of methods as well as their response to gene filtering strategies. We find that filtering genes based on the degree of missing data is either neutral or else reduces the accuracy of all five methods examined, and so is not recommended. Filtering genes based on gene tree estimation error shows somewhat different trends. Under low levels of incomplete lineage sorting, removing genes with high gene tree estimation error can improve the accuracy of summary methods, but only if not too many genes are removed. Otherwise, filtering genes tends to increase error, especially under high levels of incomplete lineage sorting. Hence, while filtering genes based on missing data is not recommended, there are conditions under which removing high error gene trees can improve species tree estimation. This study provides insights into prior studies and suggests approaches for analyzing phylogenomic datasets.


2020 ◽  
Author(s):  
Zachary B. Hancock ◽  
Heath Blackmon

AbstractIsolation by distance is a widespread pattern in nature that describes the reduction of genetic correlation between subpopulations with increased geographic distance. In the population ancestral to modern sister species, this pattern may hypothetically inflate population divergence time estimation due to the potential for allele frequency differences in subpopulations at the ends of the ancestral population. In this study, we analyze the relationship between the time to the most recent common ancestor and the population divergence time when the ancestral population model is a linear stepping-stone. Using coalescent simulations, we compare the coalescent time to the population divergence time for various ratios of the divergence time over the product of the population size and the deme number. Next, we simulate whole genomes to obtain SNPs, and use the Bayesian coalescent program SNAPP to estimate divergence times. We find that as the rate of migration between neighboring demes decreases, the coalescent time becomes significantly greater than the population divergence time when sampled from end demes. Divergence-time overestimation in SNAPP becomes severe when the divergence-to-population size ratio < 10 and migration is low. We conclude that studies estimating divergence times be cognizant of the potential ancestral population structure in an explicitly spatial context or risk dramatically overestimating the timing of population splits.


2021 ◽  
Author(s):  
Federica Valerio ◽  
Nicola Zadra ◽  
Omar Rota Stabelli ◽  
Lino Ometto

AbstractTrue fruit flies (Tephritidae) include several species that cause extensive damage to agriculture worldwide. Among them, species of the genus Bactrocera are widely studied to understand the traits associated to their invasiveness and ecology. Comparative approaches based on a reliable phylogenetic framework are particularly effective, but, to date, molecular phylogenies of Bactrocera are still controversial. Here, we employed a comprehensive genomic dataset to infer a robust backbone phylogeny of eleven representative Bactrocera species and two outgroups. We further provide the first genome scaled inference of their divergence using calibrated relaxed clock. The results of our analyses support a closer relationship of B. dorsalis to B. latifrons than to B. tryoni, in contrast to all mitochondrial-based phylogenies. By comparing different evolutionary models, we show that this incongruence likely derives from the fast and recent radiation of these species that occurred around 2 million years ago, which may be associated with incomplete lineage sorting and possibly (ongoing) hybridization. These results can serve as basis for future comparative analyses and highlight the utility of using large datasets and efficient phylogenetic approaches to study the evolutionary history of species of economic importance.


Sign in / Sign up

Export Citation Format

Share Document