Inference of species histories in the presence of gene flow

AbstractWhen populations become isolated, members of these populations can diverge genetically over time. This leads to genetic differences between individuals of these populations that increase over time if the isolation persists. This process can be counteracted when genes are exchanged between populations. In order to study the speciation processes when gene flow is present, isolation-with-migration methods have been developed. These methods typically assume that the ranked topology of the species history is already known. However, this is often not the case and the species tree is therefore of interest itself. To infer it is currently only possible when assuming no gene flow. This assumption can lead to wrongly inferred speciation times and species tree topologies.Building on a recently introduced structured coalescent approach, we introduce a new method that allows inference of the species tree while explicitly modelling the flow of genes between coexisting species. By using Markov chain Monte Carlo sampling, we co-infer the species tree alongside evolutionary parameters of interest. By using simulations, we show that our newly introduced approach is able to reliably infer the species trees and parameters of the isolation-with-migration model from genetic sequence data. We then infer the species history of six great ape species including gene flow after population isolation. By using this dataset, we are able to show that our new methods is able to infer the correct species tree not only on simulated but also on a real data set where the species history has already been well studied. In line with previous results, we find some support for some gene flow between bonobos and common chimpanzees.

Download Full-text

A Distance Method to Reconstruct Species Trees In the Presence of Gene Flow

10.1101/007955 ◽

2014 ◽

Author(s):

Lingfei Cui ◽

Laura Kubatko

Keyword(s):

Gene Flow ◽

Evolutionary Biology ◽

Sequence Data ◽

Divergence Time ◽

Species Tree ◽

Estimation Accuracy ◽

Species Trees ◽

Distance Method ◽

Data Set ◽

Divergence Time Estimates

One of the central tasks in evolutionary biology is to reconstruct the evolutionary relationships among species from sequence data, particularly from multilocus data. In the last ten years, many methods have been proposed to use the variance in the gene histories to estimate species trees by explicitly modeling deep coalescence. However, gene flow, another process that may produce gene history variance, has been less studied. In this paper, we propose a simple yet innovative method for species trees estimation in the presence of gene flow. Our method, called STEST (Species Tree Estimation from Speciation Times), constructs species tree estimates from pairwise speciation time or species divergence time estimates. By using methods that estimate speciation times in the presence of gene flow, (for example, M1 (Yang 2010) or SIM3s (Zhu and Yang 2012)), STEST is able to estimate species trees from data subject to gene flow. We develop two methods, called STEST (M1) and STEST (SIM3s), for this purpose. Additionally, we consider the method STEST (M0), which instead uses the M0 method (Yang 2002), a coalescent-based method that does not assume gene flow, to estimate speciation times. It is therefore devised to estimate species trees in the absence of gene flow. Our simulation studies show that STEST (M0) outperforms STEST(M1), STEST (SIM3s) and STEM in terms of estimation accuracy and outperfroms *BEAST in terms of running time when the degree of gene flow is small. STEST (M1) outperforms STEST (M0), STEST (SIM3s), STEM and *BEAST in term of estimation accuracy when the degree of gene flow is large. An empirical data set analyzed by these methods gives species tree estimates that are consistent with the previous results.

Download Full-text

Frequent Closed Partial Orders Mining in Sequences

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1304 ◽

2013 ◽

Vol 846-847 ◽

pp. 1304-1307

Author(s):

Ye Wang ◽

Yan Jia ◽

Lu Min Zhang

Keyword(s):

Sequence Data ◽

Real Data ◽

Partial Orders ◽

Hard Problem ◽

Important Data ◽

Data Set ◽

Pruning Algorithm ◽

Equal Chance ◽

Np Hard Problem ◽

General Sequences

Mining partial orders from sequence data is an important data mining task with broad applications. As partial orders mining is a NP-hard problem, many efficient pruning algorithm have been proposed. In this paper, we improve a classical algorithm of discovering frequent closed partial orders from string. For general sequences, we consider items appearing together having equal chance to calculate the detecting matrix used for pruning. Experimental evaluations from a real data set show that our algorithm can effectively mine FCPO from sequences.

Download Full-text

Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets: Testing Ancient Hybridizations in Amaranthaceae s.l

Systematic Biology ◽

10.1093/sysbio/syaa066 ◽

2020 ◽

Cited By ~ 1

Author(s):

Diego F Morales-Briones ◽

Gudrun Kadereit ◽

Delphine T Tefarikis ◽

Michael J Moore ◽

Stephen A Smith ◽

...

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Species Tree ◽

Data Sets ◽

Species Trees ◽

Lineage Sorting ◽

Data Set ◽

Gene Tree Discordance

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]

Download Full-text

Uprooting phylogenetic uncertainty in coalescent species delimitation: A meta-analysis of empirical studies

Current Zoology ◽

10.1093/czoolo/61.5.866 ◽

2015 ◽

Vol 61 (5) ◽

pp. 866-873 ◽

Cited By ~ 13

Author(s):

Itzue W. Caviedes-Solis ◽

Nassima M. Bouzid ◽

Barbara L. Banbury ◽

Adam D. Leaché

Keyword(s):

Species Delimitation ◽

Sequence Data ◽

Meta Analysis ◽

Empirical Studies ◽

Species Tree ◽

Species Trees ◽

Posterior Probabilities ◽

Guide Tree ◽

Tree Topologies ◽

Accurate Quantification

Abstract Phylogenetic and phylogeographic studies rely on the accurate quantification of biodiversity. In recent studies of taxonomically ambiguous groups, species boundaries are often determined based on multi-locus sequence data. Bayesian Phylogenetics and Phylogeography (BPP) is a coalescent-based method frequently used to delimit species; however, empirical studies suggest that the requirement of a user-specified guide tree biases the range of possible outcomes. We evaluate fifteen multi-locus datasets using the most recent iteration of BPP, which eliminates the need for a user-specified guide tree and reconstructs the species tree in synchrony with species delimitation (= unguided species delimitation). We found that the number of species recovered with guided versus unguided species delimitation was the same except for two cases, and that posterior probabilities were generally lower for the unguided analyses as a result of searching across species trees in addition to species delimitation models. The guide trees used in previous studies were often discordant with the species tree topologies estimated by BPP. We also compared species trees estimated using BPP and *BEAST and found that when the topologies are the same, BPP tends to give higher posterior probabilities.

Download Full-text

Terraces in Gene Tree Reconciliation-Based Species Tree Inference

10.1101/2020.04.17.047092 ◽

2020 ◽

Author(s):

Michael J. Sanderson ◽

Michelle M. McMahon ◽

Mike Steel

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Solution Space ◽

Species Tree ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Data Set ◽

Tree Reconciliation ◽

The Impact

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.

Download Full-text

Divergence estimation in the presence of incomplete lineage sorting and migration

10.1101/174342 ◽

2017 ◽

Author(s):

Graham Jones

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Species Tree ◽

Lineage Sorting ◽

Data Set ◽

Migration Rates ◽

Isolation With Migration ◽

Multispecies Coalescent ◽

Tree Inference ◽

And Migration

AbstractThis paper focuses on the problem of estimating a species tree from multilocus data in the presence of incomplete lineage sorting and migration. We develop a mathematical model similar to IMa2 (Hey 2010) for the relevant evolutionary processes which allows both the the population size parameters and the migration rates between pairs of species tree branches to be integrated out. We then describe a BEAST2 package DENIM which based on this model, and which uses an approximation to sample from the posterior. The approximation is based on the assumption that migrations are rare, and it only samples from certain regions of the posterior which seem likely given this assumption. The method breaks down if there is a lot of migration. Using simulations, Leaché et al 2014 showed migration causes problems for species tree inference using the multispecies coalescent when migration is present but ignored. We re-analyze this simulated data to explore DENIM’s performance, and demonstrate substantial improvements over *BEAST. We also re-analyze an empirical data set. [isolation-with-migration; incomplete lineage sorting; multispecies coalescent; species tree; phylogenetic analysis; Bayesian; Markov chain Monte Carlo]

Download Full-text

Inferring Species Trees Using Integrative Models of Species Evolution

10.1101/242875 ◽

2018 ◽

Cited By ~ 9

Author(s):

Huw A. Ogilvie ◽

Timothy G. Vaughan ◽

Nicholas J. Matzke ◽

Graham J. Slater ◽

Tanja Stadler ◽

...

Keyword(s):

Gene Evolution ◽

Sequence Data ◽

A Priori ◽

Morphological Characters ◽

Mirror Image ◽

Integrative Model ◽

Species Trees ◽

Data Set ◽

Molecular Sequence Data ◽

Molecular Sequence

AbstractBayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.

Download Full-text

Distribution of gene tree histories under the coalescent model with gene flow

10.1101/023937 ◽

2015 ◽

Author(s):

Yuan Tian ◽

Laura Kubatko

Keyword(s):

Gene Flow ◽

Maximum Likelihood ◽

Gene Tree ◽

Species Tree ◽

Tree Topology ◽

Effective Population ◽

Sequence Alignments ◽

Data Set ◽

Coalescent Model ◽

Population Sizes

We propose a coalescent model for three species that allows gene flow between both pairs of sister populations. The model is designed to analyze multilocus genomic sequence alignments, with one sequence sampled from each of the three species. The model is formulated using a Markov chain representation, which allows use of matrix exponentiation to compute analytical expressions for the probability density of gene tree genealogies. The gene tree history distribution as well as the gene tree topology distribution under this coalescent model with gene flow are then calculated via numerical integration. We analyze the model to compare the distributions of gene tree topologies and gene tree histories for species trees with differing effective population sizes and gene flow rates. Our results suggest conditions under which the species tree and associated parameters are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the gene tree history distribution may identify the species tree and associated parameters. Thus, the gene tree history distribution can be used to infer parameters such as the ancestral effective population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We conduct computer simulations to evaluate the performance of our method in estimating these parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 2015) to demonstrate the usefulness of our method for the analysis of empirical data. Key words: coalescent, gene flow, migration, hybridization, gene tree, topology, history, maximum likelihood, speciation.

Download Full-text

Estimating phylogenetic relationships despite discordant gene trees across loci: the species tree of a diverse species group of feather mites (Acari: Proctophyllodidae)

Parasitology ◽

10.1017/s003118201100031x ◽

2011 ◽

Vol 138 (13) ◽

pp. 1750-1759 ◽

Cited By ~ 19

Author(s):

LACEY L. KNOWLES ◽

PAVEL B. KLIMOV

Keyword(s):

Phylogenetic Relationships ◽

Species Interactions ◽

Sequence Data ◽

Species Group ◽

Species Tree ◽

Terminal Branch ◽

Gene Trees ◽

Species Trees ◽

Feather Mites ◽

History Of

SUMMARYWith the increased availability of multilocus sequence data, the lack of concordance of gene trees estimated for independent loci has focused attention on both the biological processes producing the discord and the methodologies used to estimate phylogenetic relationships. What has emerged is a suite of new analytical tools for phylogenetic inference – species tree approaches. In contrast to traditional phylogenetic methods that are stymied by the idiosyncrasies of gene trees, approaches for estimating species trees explicitly take into account the cause of discord among loci and, in the process, provides a direct estimate of phylogenetic history (i.e. the history of species divergence, not divergence of specific loci). We illustrate the utility of species tree estimates with an analysis of a diverse group of feather mites, the pinnatus species group (genus Proctophyllodes). Discord among four sequenced nuclear loci is consistent with theoretical expectations, given the short time separating speciation events (as evident by short internodes relative to terminal branch lengths in the trees). Nevertheless, many of the relationships are well resolved in a Bayesian estimate of the species tree; the analysis also highlights ambiguous aspects of the phylogeny that require additional loci. The broad utility of species tree approaches is discussed, and specifically, their application to groups with high speciation rates – a history of diversification with particular prevalence in host/parasite systems where species interactions can drive rapid diversification.

Download Full-text

ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets

Journal of Heredity ◽

10.1093/jhered/esz076 ◽

2019 ◽

Vol 111 (2) ◽

pp. 147-168 ◽

Cited By ~ 5

Author(s):

Mark S Springer ◽

Erin K Molloy ◽

Daniel B Sloan ◽

Mark P Simmons ◽

John Gatesy

Keyword(s):

Dna Sequences ◽

Species Tree ◽

Data Sets ◽

Species Trees ◽

Split Decomposition ◽

Sequence Alignments ◽

Data Set ◽

Short Branch ◽

Network Methods ◽

Branch Lengths

Abstract DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.

Download Full-text