scholarly journals NetRAX: Accurate and Fast Maximum Likelihood Phylogenetic Network Inference⋆

2021 ◽  
Author(s):  
Sarah Lutteropp ◽  
Céline Scornavacca ◽  
Alexey M. Kozlov ◽  
Benoit Morel ◽  
Alexandros Stamatakis

AbstractPhylogenetic networks are used to represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity. Hence, current tools can only analyze small data sets.We present NetRAX, a tool for maximum likelihood inference of phylogenetic networks in the absence of incomplete lineage sorting. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of “displayed trees”. NetRAX can infer maximum likelihood phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format.On simulated data, our results show a very low relative difference in BIC score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8, 000 sites, 30 taxa, and 3 reticulations completes within a few minutes on a standard laptop.Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX.

2019 ◽  
Author(s):  
Jiafan Zhu ◽  
Xinhao Liu ◽  
Huw A. Ogilvie ◽  
Luay K. Nakhleh

AbstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting (ILS). However, these methods can only handle a small number of loci from a handful of genomes.In this paper, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological data sets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet)[email protected]


2019 ◽  
Vol 35 (14) ◽  
pp. i370-i378 ◽  
Author(s):  
Jiafan Zhu ◽  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay K Nakhleh

Abstract Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Dingqiao Wen ◽  
Yun Yu ◽  
Jiafan Zhu ◽  
Luay Nakhleh

AbstractPhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or bi-allelic markers) is implemented. Maximum parsimony is based on an extension of the “minimizing deep coalescences” criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudo-likelihood measure. PhyloNet summarizes the results of the various analyses, and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software, [phylogenetic networks; reticulation; incomplete lineage sorting; multispecies network coalescent; Bayesian inference; maximum likelihood; maximum parsimony.]


2020 ◽  
Author(s):  
Zhi Yan ◽  
Zhen Cao ◽  
Yushu Liu ◽  
Luay Nakhleh

AbstractPhylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene trees as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. The method could also infer trees, thus potentially distinguishing between auto- and allo-polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating given phylogenetic networks are implemented and publicly available in the PhyloNet software package.


2021 ◽  
Author(s):  
Zhi Yan ◽  
Zhen Cao ◽  
Yushu Liu ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Abstract Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package.


2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness


Author(s):  
Diego F Morales-Briones ◽  
Gudrun Kadereit ◽  
Delphine T Tefarikis ◽  
Michael J Moore ◽  
Stephen A Smith ◽  
...  

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]


2019 ◽  
Author(s):  
Zhen Cao ◽  
Xinhao Liu ◽  
Huw A. Ogilvie ◽  
Zhi Yan ◽  
Luay Nakhleh

AbstractPhylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks.In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy.The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.


2019 ◽  
Author(s):  
Diego F. Morales-Briones ◽  
Gudrun Kadereit ◽  
Delphine T. Tefarikis ◽  
Michael J. Moore ◽  
Stephen A. Smith ◽  
...  

AbstractGene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.


2019 ◽  
Vol 69 (3) ◽  
pp. 593-601 ◽  
Author(s):  
Christopher Blair ◽  
Cécile Ané

Abstract Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene flow post divergence, calling for additional methods that can accommodate this limitation. Explicit phylogenetic network methods have emerged, which can simultaneously account for ILS and gene flow by representing evolutionary history as a directed acyclic graph. In this point of view, we highlight some of the strengths and limitations of phylogenetic networks and argue that tree-based inference should not be blindly abandoned in favor of networks simply because they represent more parameter rich models. Attention should be given to model selection of reticulation complexity, and the most robust conclusions regarding evolutionary history are likely obtained when combining tree- and network-based inference.


Sign in / Sign up

Export Citation Format

Share Document