DGEN: A Test Statistic for Detection of General Introgression Scenarios

AbstractWhen two species hybridize, one outcome is the integration of genetic material from one species into the genome of the other, a process known as introgression. Detecting introgression in genomic data is a very important question in evolutionary biology. However, given that hybridization occurs between closely related species, a compli-cating factor for introgression detection is the presence of incomplete lineage sorting, or ILS. The D-statistic, famously referred to as the “ABBA-BABA” test, was pro-posed for introgression detection in the presence of ILS in data sets that consist of four genomes. More recently, DFOIL—a set of statistics—was introduced to extend the D-statistic to data sets of five genomes.The major contribution of this paper is demonstrating that the invariants underly-ing both the D-statistic and DFOIL can be derived automatically from the probability mass functions of gene tree topologies under the null species tree model and alterna-tive phylogenetic network model. Computational requirements aside, this automatic derivation provides a way to generalize these statistics to data sets of any size and with any scenarios of introgression. We demonstrate the accuracy of the general statistic, which we call DGEN, on simulated data sets with varying rates of introgression, and apply it to an empirical data set of mosquito genomes.We have implemented DGEN and made it available, both as a graphical user interface tool and as a command-line tool, as part of the freely available, open-source software package ALPHA (https://github.com/chilleo/ALPHA).

Download Full-text

Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets: Testing Ancient Hybridizations in Amaranthaceae s.l

Systematic Biology ◽

10.1093/sysbio/syaa066 ◽

2020 ◽

Cited By ~ 1

Author(s):

Diego F Morales-Briones ◽

Gudrun Kadereit ◽

Delphine T Tefarikis ◽

Michael J Moore ◽

Stephen A Smith ◽

...

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Species Tree ◽

Data Sets ◽

Species Trees ◽

Lineage Sorting ◽

Data Set ◽

Gene Tree Discordance

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]

Download Full-text

Detecting destabilizing species in the phylogenetic backbone of Potentilla (Rosaceae) using low-copy nuclear markers

AoB Plants ◽

10.1093/aobpla/plaa017 ◽

2020 ◽

Vol 12 (3) ◽

Cited By ~ 1

Author(s):

Nannie L Persson ◽

Ingrid Toresen ◽

Heidi Lie Andersen ◽

Jenny E E Smedmark ◽

Torsten Eriksson

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Data Sets ◽

Gene Trees ◽

Lineage Sorting ◽

Nuclear Markers ◽

Data Set ◽

Multispecies Coalescent ◽

Single Marker ◽

Named Groups

Abstract The genus Potentilla (Rosaceae) has been subjected to several phylogenetic studies, but resolving its evolutionary history has proven challenging. Previous analyses recovered six, informally named, groups: the Argentea, Ivesioid, Fragarioides, Reptans, Alba and Anserina clades, but the relationships among some of these clades differ between data sets. The Reptans clade, which includes the type species of Potentilla, has been noticed to shift position between plastid and nuclear ribosomal data sets. We studied this incongruence by analysing four low-copy nuclear markers, in addition to chloroplast and nuclear ribosomal data, with a set of Bayesian phylogenetic and Multispecies Coalescent (MSC) analyses. A selective taxon removal strategy demonstrated that the included representatives from the Fragarioides clade, P. dickinsii and P. fragarioides, were the main sources of the instability seen in the trees. The Fragarioides species showed different relationships in each gene tree, and were only supported as a monophyletic group in a single marker when the Reptans clade was excluded from the analysis. The incongruences could not be explained by allopolyploidy, but rather by homoploid hybridization, incomplete lineage sorting or taxon sampling effects. When P. dickinsii and P. fragarioides were removed from the data set, a fully resolved, supported backbone phylogeny of Potentilla was obtained in the MSC analysis. Additionally, indications of autopolyploid origins of the Reptans and Ivesioid clades were discovered in the low-copy gene trees.

Download Full-text

Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes

Systematic Biology ◽

10.1093/sysbio/syab081 ◽

2021 ◽

Author(s):

Zhi Yan ◽

Zhen Cao ◽

Yushu Liu ◽

Huw A Ogilvie ◽

Luay Nakhleh

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Data Sets ◽

Polyploid Species ◽

Lineage Sorting ◽

Work Done

Abstract Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package.

Download Full-text

Partitioned Gene-Tree Analyses and Gene-Based Topology Testing Help Resolve Incongruence in a Phylogenomic Study of Host-Specialist Bees (Apidae: Eucerinae)

Molecular Biology and Evolution ◽

10.1093/molbev/msaa277 ◽

2020 ◽

Author(s):

Felipe V Freitas ◽

Michael G Branstetter ◽

Terry Griswold ◽

Eduardo A B Almeida

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Species Tree ◽

Data Sets ◽

Gene Trees ◽

Lineage Sorting ◽

Data Set ◽

Analytical Strategy ◽

Analytical Approaches ◽

Phylogenomic Study

Abstract Incongruence among phylogenetic results has become a common occurrence in analyses of genome-scale data sets. Incongruence originates from uncertainty in underlying evolutionary processes (e.g., incomplete lineage sorting) and from difficulties in determining the best analytical approaches for each situation. To overcome these difficulties, more studies are needed that identify incongruences and demonstrate practical ways to confidently resolve them. Here, we present results of a phylogenomic study based on the analysis 197 taxa and 2,526 ultraconserved element (UCE) loci. We investigate evolutionary relationships of Eucerinae, a diverse subfamily of apid bees (relatives of honey bees and bumble bees) with >1,200 species. We sampled representatives of all tribes within the group and >80% of genera, including two mysterious South American genera, Chilimalopsis and Teratognatha. Initial analysis of the UCE data revealed two conflicting hypotheses for relationships among tribes. To resolve the incongruence, we tested concatenation and species tree approaches and used a variety of additional strategies including locus filtering, partitioned gene-trees searches, and gene-based topological tests. We show that within-locus partitioning improves gene tree and subsequent species-tree estimation, and that this approach, confidently resolves the incongruence observed in our data set. After exploring our proposed analytical strategy on eucerine bees, we validated its efficacy to resolve hard phylogenetic problems by implementing it on a published UCE data set of Adephaga (Insecta: Coleoptera). Our results provide a robust phylogenetic hypothesis for Eucerinae and demonstrate a practical strategy for resolving incongruence in other phylogenomic data sets.

Download Full-text

A test statistic to quantify treelikeness in phylogenetics

10.1101/2021.02.16.431544 ◽

2021 ◽

Author(s):

Caitlin Cherryh ◽

Bui Quang Minh ◽

Rob Lanfear

Keyword(s):

Evolutionary History ◽

Incomplete Lineage Sorting ◽

Phylogenetic Analyses ◽

Phylogenetic Network ◽

Parametric Bootstrap ◽

Lineage Sorting ◽

Test Statistic ◽

Sequence Alignments ◽

Wide Range ◽

History Of

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness

Download Full-text

Terraces in Gene Tree Reconciliation-Based Species Tree Inference

10.1101/2020.04.17.047092 ◽

2020 ◽

Author(s):

Michael J. Sanderson ◽

Michelle M. McMahon ◽

Mike Steel

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Solution Space ◽

Species Tree ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Data Set ◽

Tree Reconciliation ◽

The Impact

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.

Download Full-text

Practical Aspects of Phylogenetic Network Analysis Using PhyloNet

10.1101/746362 ◽

2019 ◽

Author(s):

Zhen Cao ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Zhi Yan ◽

Luay Nakhleh

Keyword(s):

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Synthetic Data ◽

Simulated Data ◽

Single Species ◽

Phylogenetic Networks ◽

Lineage Sorting ◽

Data Set ◽

Types Of Information ◽

Analyze Data

AbstractPhylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks.In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy.The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.

Download Full-text

An ABBA-BABA Test for Introgression Using Retroposon Insertion Data

10.1101/709477 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mark S. Springer ◽

John Gatesy

Keyword(s):

Dna Sequences ◽

Incomplete Lineage Sorting ◽

Species Tree ◽

Data Sets ◽

Lineage Sorting ◽

Sequence Alignments ◽

Data Set ◽

Short Branch ◽

Phylogenetic Methods ◽

Branch Lengths

AbstractDNA sequence alignments provide the majority of data for inferring phylogenetic relationships with both concatenation and coalescence methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroposon insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroposon insertions satisfy the ‘no intralocus recombination’ assumption of summary coalescence methods because they are singular events and better approximate neutrality relative to DNA sequences commonly applied in phylogenomic work. Retroposons have traditionally been analyzed with phylogenetic methods that ignore incomplete lineage sorting (ILS). Here, we analyze three retroposon data sets for mammals (Placentalia, Laurasiatheria, Balaenopteroidea) with two different ILS-aware methods. The first approach constructs a species tree from retroposon bipartitions with ASTRAL, and the second is a modification of SVD-Quartets. We also develop a χ2 Quartet-Asymmetry Test to detect hybridization using retroposon data. Both coalescence methods recovered the same topology for each of the three data sets. The ASTRAL species tree for Laurasiatheria has consecutive short branch lengths that are consistent with an anomaly zone situation. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both coalescence methods recovered a topology that supports the paraphyly of Balaenopteridae. Application of the χ2 Quartet-Asymmetry Test to this data set detected 16 different quartets of species for which historical hybridization may be inferred, but significant asymmetry was not detected in the placental root and Laurasiatheria analyses.

Download Full-text

Harnessing the power of phylogenomics to disentangle the directionality and signatures of interkingdom host jumping in the parasitic fungal genus Tolypocladium

Mycologia ◽

10.1080/00275514.2018.1442618 ◽

2018 ◽

Vol 110 (1) ◽

pp. 104-117 ◽

Cited By ~ 3

Author(s):

Quandt ◽

Patterson ◽

Spatafora

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Gene Clusters ◽

Host Specialization ◽

Comparative Genomic ◽

Character State ◽

Lineage Sorting ◽

Additional Species ◽

Data Set ◽

Host Jump

Host specialization is common among parasitic fungi; however, there are examples when transitions in host specificity between disparately related hosts have occurred. Here, we examine the interkingdom host jump from insect pathogenicity and mycoparasitism in Tolypocladium. Previous phylogenetic inferences made using only a few genes and with poor support reconstructed an ancestral character state of insect pathogenesis, a transition to mycoparasitism, and reversions to insect pathogenesis. To further explore the directionality and genes underlying the transitions in host, we sequenced two additional species of Tolypocladium (T. capitatum and T. paradoxum) and used phylogenomics to compare two insect pathogens and two mycoparasites. Our whole-genome-scale analysis suggests that the diversification of Tolypocladium species happened relatively quickly and that the truffle parasites form a monophyletic, derived lineage within the genus that is the result of a single ecological transition or host jump from insects to fungi. A significant amount of gene tree/species tree discordance occurs within the data set, and we infer this to be the product of both an historical hybridization event and incomplete lineage sorting that was likely because of the rapid diversification of the clade. Furthermore, comparative genomic analyses revealed a set of genes that are exclusive to the mycoparasitic species. These potentially mycoparasitic gene clusters were characterized by a reduced proportion of secreted proteins when compared with entomopathogen-enriched genes and involved the reshaping of the fungal secretome in the ecological context of mycoparasitism.

Download Full-text

Disentangling Sources of Gene Tree Discordance in Phylogenomic Datasets: Testing Ancient Hybridizations in Amaranthaceae s.l

10.1101/794370 ◽

2019 ◽

Cited By ~ 2

Author(s):

Diego F. Morales-Briones ◽

Gudrun Kadereit ◽

Delphine T. Tefarikis ◽

Michael J. Moore ◽

Stephen A. Smith ◽

...

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Molecular Data ◽

Species Trees ◽

Lineage Sorting ◽

Multiple Sources ◽

Gene Tree Discordance ◽

Phylogenomic Analyses

AbstractGene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

Download Full-text