The rigid hybrid number for two phylogenetic trees

AbstractRecently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.

Download Full-text

Phylogenetic Trees and Networks Can Serve as Powerful and Complementary Approaches for Analysis of Genomic Data

Systematic Biology ◽

10.1093/sysbio/syz056 ◽

2019 ◽

Vol 69 (3) ◽

pp. 593-601 ◽

Cited By ~ 9

Author(s):

Christopher Blair ◽

Cécile Ané

Keyword(s):

Gene Flow ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Genomic Data ◽

Point Of View ◽

Phylogenetic Networks ◽

Lineage Sorting

Abstract Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene flow post divergence, calling for additional methods that can accommodate this limitation. Explicit phylogenetic network methods have emerged, which can simultaneously account for ILS and gene flow by representing evolutionary history as a directed acyclic graph. In this point of view, we highlight some of the strengths and limitations of phylogenetic networks and argue that tree-based inference should not be blindly abandoned in favor of networks simply because they represent more parameter rich models. Attention should be given to model selection of reticulation complexity, and the most robust conclusions regarding evolutionary history are likely obtained when combining tree- and network-based inference.

Download Full-text

A test statistic to quantify treelikeness in phylogenetics

10.1101/2021.02.16.431544 ◽

2021 ◽

Author(s):

Caitlin Cherryh ◽

Bui Quang Minh ◽

Rob Lanfear

Keyword(s):

Evolutionary History ◽

Incomplete Lineage Sorting ◽

Phylogenetic Analyses ◽

Phylogenetic Network ◽

Parametric Bootstrap ◽

Lineage Sorting ◽

Test Statistic ◽

Sequence Alignments ◽

Wide Range ◽

History Of

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness

Download Full-text

Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets: Testing Ancient Hybridizations in Amaranthaceae s.l

Systematic Biology ◽

10.1093/sysbio/syaa066 ◽

2020 ◽

Cited By ~ 1

Author(s):

Diego F Morales-Briones ◽

Gudrun Kadereit ◽

Delphine T Tefarikis ◽

Michael J Moore ◽

Stephen A Smith ◽

...

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Species Tree ◽

Data Sets ◽

Species Trees ◽

Lineage Sorting ◽

Data Set ◽

Gene Tree Discordance

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]

Download Full-text

Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data

10.1101/095539 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dingqiao Wen ◽

Luay Nakhleh

Keyword(s):

Gene Flow ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Simulated Data ◽

Biological Data ◽

Generative Model ◽

Divergence Times ◽

Gene Trees ◽

Lineage Sorting ◽

Coalescence Times

AbstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set. [Multispecies network coalescent; reticulation; incomplete lineage sorting; phylogenetic network; Bayesian inference; RJMCMC.]

Download Full-text

A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data

10.1101/587725 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jiafan Zhu ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Luay K. Nakhleh

Keyword(s):

Large Scale ◽

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Divide And Conquer ◽

Lineage Sorting ◽

Step Method ◽

Sequence Alignments

AbstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting (ILS). However, these methods can only handle a small number of loci from a handful of genomes.In this paper, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological data sets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet)[email protected]

Download Full-text

Practical Aspects of Phylogenetic Network Analysis Using PhyloNet

10.1101/746362 ◽

2019 ◽

Author(s):

Zhen Cao ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Zhi Yan ◽

Luay Nakhleh

Keyword(s):

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Synthetic Data ◽

Simulated Data ◽

Single Species ◽

Phylogenetic Networks ◽

Lineage Sorting ◽

Data Set ◽

Types Of Information ◽

Analyze Data

AbstractPhylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks.In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy.The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.

Download Full-text

Disentangling Sources of Gene Tree Discordance in Phylogenomic Datasets: Testing Ancient Hybridizations in Amaranthaceae s.l

10.1101/794370 ◽

2019 ◽

Cited By ~ 2

Author(s):

Diego F. Morales-Briones ◽

Gudrun Kadereit ◽

Delphine T. Tefarikis ◽

Michael J. Moore ◽

Stephen A. Smith ◽

...

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Molecular Data ◽

Species Trees ◽

Lineage Sorting ◽

Multiple Sources ◽

Gene Tree Discordance ◽

Phylogenomic Analyses

AbstractGene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

Download Full-text

Diagnosis of the mechanisms of different types of discordances between phylogenies inferred from nuclear and mitochondrial markers

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj20.634 ◽

2020 ◽

Vol 24 (4) ◽

pp. 420-426

Author(s):

A. A. Poroshina ◽

D. Y. Sherbakov ◽

T. E. Peretolchina

Keyword(s):

Phylogenetic Trees ◽

Environmental Changes ◽

Incomplete Lineage Sorting ◽

Adaptive Dynamics ◽

Introgressive Hybridization ◽

Sister Species ◽

Complex Case ◽

Special Focus ◽

Large Species ◽

Lineage Sorting

In ancient freshwater lakes, an abnormally large species diversity is observed. The mechanisms that g nerated extremely high biodiversity in the ancient lakes have not been sufficiently studied and remain only partially known. Sequences of environmental changes in highly complex ecosystems such as Lake Baikal, may induce sophisticated combinations of microevolutionary processes. These processes are likely to result in unusual “patterns” of genetic variability of species. The most unusual patterns include the ones when speciation is followed by incomplete lineage sorting as well as mitochondrial or nuclear introgression. All these phenomena are diagnosed by comparing the topologies of phylogenetic trees inferred from molecular markers of evolution located in mitochondria and nuclei. Mitochondrial and nuclear introgression is a particularly interesting and complex case, which is the process of incorporating the gene alleles of one species into the gene pool of a sister species due to interspecific hybridization (introgressive hybridization). In many cases, existing methods for molecular phylogenetic analysis do not automatically allow the observed patterns of polymorphism to be explained and, therefore, cannot provide hypotheses that would explain the mechanisms which resulted to these patterns. Here we use adaptive dynamics models to study neutral molecular evolution under various scenarios of interaction between sister species and the environment. We propose and justify a set of criteria for detecting how two evolutionary trees may differ, with a special focus on comparing a tree inferred from nuclear DNA to one from mitochondrial DNA. The criteria react to branching pattern and branch lengths, including relative distances from ancestral lineages. Simulations show that the criteria allow fast and automated detection of various types of introgression, secondary breaches of reproductive barriers, and incomplete lineage sorting.

Download Full-text

Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes

10.1101/2020.09.28.317651 ◽

2020 ◽

Author(s):

Zhi Yan ◽

Zhen Cao ◽

Yushu Liu ◽

Luay Nakhleh

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Data Sets ◽

Gene Trees ◽

Polyploid Species ◽

Lineage Sorting ◽

Work Done

AbstractPhylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene trees as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. The method could also infer trees, thus potentially distinguishing between auto- and allo-polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating given phylogenetic networks are implemented and publicly available in the PhyloNet software package.

Download Full-text

Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes

Systematic Biology ◽

10.1093/sysbio/syab081 ◽

2021 ◽

Author(s):

Zhi Yan ◽

Zhen Cao ◽

Yushu Liu ◽

Huw A Ogilvie ◽

Luay Nakhleh

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Data Sets ◽

Polyploid Species ◽

Lineage Sorting ◽

Work Done

Abstract Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package.

Download Full-text