Reconstructing k-Reticulated Phylogenetic Network from a Set of Gene Trees

AbstractIn phylogenetics, a set of gene trees is often summarized by a consensus tree, such as the majority consensus, which is based on the set of all splits that are present in more than 50% of the input trees. A “consensus network” is obtained by lowering the threshold and considering all splits that are contained in 10% of the trees, say, and then computing the corresponding splits network. By construction and in practice, a consensus network usually shows the majority tree, extended by a number of rectangles that represent local rearrangements around internal nodes of the consensus tree. This may lead to the false conclusion that the input trees do not differ in a significant way because “even a phylogenetic network” does not display any large discrepancies. To harness the full potential of a phylogenetic network, we introduce the new concept of an anti-consensus network that aims at representing the largest interesting discrepancies found in a set of gene trees. We provide an efficient algorithm for computing an anti-consensus and illustrate its application using a set of gene trees from species and strains in the genus Kosakonia.

Download Full-text

Species Phylogeny versus Gene Trees: A Case Study of an Incongruent Data Matrix Based on Paphiopedilum Pfitz. (Orchidaceae)

International Journal of Molecular Sciences ◽

10.3390/ijms222111393 ◽

2021 ◽

Vol 22 (21) ◽

pp. 11393

Author(s):

Marcin Górniak ◽

Dariusz L. Szlachetko ◽

Natalia Olędrzyńska ◽

Aleksandra M. Naczk ◽

Agata Mieszkowska ◽

...

Keyword(s):

Phylogenetic Network ◽

Morphological Characteristics ◽

Nuclear Gene ◽

Data Matrix ◽

Gene Trees ◽

Nuclear Markers ◽

Protein Coding ◽

Versus Gene ◽

Tree Topologies

The phylogeny of the genus Paphiopedilum based on the plastome is consistent with morphological analysis. However, to date, none of the analyzed nuclear markers has confirmed this. Topology incongruence among the trees of different nuclear markers concerns entire sections of the subgenus Paphiopedilum. The low-copy nuclear protein-coding gene PHYC was obtained for 22 species representing all sections and subgenera of Paphiopedilum. The nuclear-based phylogeny is supported by morphological characteristics and plastid data analysis. We assumed that an incongruence in nuclear gene trees is caused by ancestral homoploid hybridization. We present a model for inferring the phylogeny of the species despite the incongruence of the different tree topologies. Our analysis, based on six low-copy nuclear genes, is congruent with plastome phylogeny and has been confirmed by phylogenetic network analysis.

Download Full-text

Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data

10.1101/095539 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dingqiao Wen ◽

Luay Nakhleh

Keyword(s):

Gene Flow ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Simulated Data ◽

Biological Data ◽

Generative Model ◽

Divergence Times ◽

Gene Trees ◽

Lineage Sorting ◽

Coalescence Times

AbstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set. [Multispecies network coalescent; reticulation; incomplete lineage sorting; phylogenetic network; Bayesian inference; RJMCMC.]

Download Full-text

Phylogenomic analyses using genomes and transcriptomes do not resolve relationships among major clades in Phrymaceae

10.1101/2021.11.17.468996 ◽

2021 ◽

Author(s):

Diego F. Morales-Briones ◽

Nan Lin ◽

Eileen Y. Huang ◽

Dena L. Grossenbacher ◽

James M. Sobel ◽

...

Keyword(s):

Phylogenetic Relationships ◽

Evolutionary Ecology ◽

Gene Tree ◽

Phylogenetic Network ◽

Reticulate Evolution ◽

Small Scale ◽

Gene Trees ◽

Network Analyses ◽

Gene Tree Discordance ◽

Close Relatives

Premise of the study: Phylogenomic datasets using genomes and transcriptomes provide rich opportunities beyond resolving bifurcating phylogenetic relationships. Monkeyflower (Phrymaceae) is a model system for evolutionary ecology. However, it lacks a well-supported phylogeny for a stable taxonomy and for macroevolutionary comparisons. Methods: We sampled 24 genomes and transcriptomes in Phrymaceae and closely related families, including eight newly sequenced transcriptomes. We reconstructed the phylogeny using IQ-TREE and ASTRAL, evaluated gene tree discordance using PhyParts, Quartet Sampling, and cloudogram, and carried out phylogenetic network analyses using PhyloNet and HyDe. We searched for whole genome duplication (WGD) events using chromosome numbers, synonymous distance, and gene duplication events. Key results: Most gene trees support the monophyly of Phrymaceae and each of its tribes. Most gene trees also support the tribe Mimuleae being sister to Phrymeae + Diplaceae + Leucocarpeae, with extensive gene tree discordance among the latter three. Despite the discordance, polyphyly of Mimulus s.l. is strongly supported, and no particular reticulation event among the Phrymaceae tribes is well supported. Reticulation likely occurred among Erythranthe bicolor and close relatives. No ancient WGD event was detected in Phrymaceae. Instead, small-scale duplications are among potential drivers of macroevolutionary diversification of Phrymaceae. Conclusions: We show that analysis of reticulate evolution is sensitive to taxon sampling and methods used. We also demonstrate that genome-scale data do not always fully "resolve" phylogenetic relationships. They present rich opportunities to investigate reticulate evolution, and gene and genome evolution involved in lineage diversification and adaptation.

Download Full-text

Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes

10.1101/2020.09.28.317651 ◽

2020 ◽

Author(s):

Zhi Yan ◽

Zhen Cao ◽

Yushu Liu ◽

Luay Nakhleh

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Data Sets ◽

Gene Trees ◽

Polyploid Species ◽

Lineage Sorting ◽

Work Done

AbstractPhylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene trees as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. The method could also infer trees, thus potentially distinguishing between auto- and allo-polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating given phylogenetic networks are implemented and publicly available in the PhyloNet software package.

Download Full-text

Bayesian Inference of Species Networks from Multilocus Sequence Data

10.1101/124982 ◽

2017 ◽

Cited By ~ 6

Author(s):

Chi Zhang ◽

Huw A. Ogilvie ◽

Alexei J. Drummond ◽

Tanja Stadler

Keyword(s):

Bayesian Inference ◽

Gene Evolution ◽

Sequence Data ◽

Phylogenetic Network ◽

Orthologous Gene ◽

Reticulate Evolution ◽

Gene Trees ◽

Hybridization Event ◽

Yeast Data ◽

Species Evolution

AbstractReticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent (MSNC) prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large datasets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.

Download Full-text

TREEasy: an automated workflow to infer gene trees, species trees, and phylogenetic networks from multilocus data

10.1101/706390 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yafei Mao ◽

Siqing Hou ◽

Evan P. Economo

Keyword(s):

Tree Species ◽

Network Inference ◽

Gene Tree ◽

Phylogenetic Network ◽

Handling Time ◽

Phylogenetic Networks ◽

Recent Analysis ◽

Gene Trees ◽

Species Trees ◽

Tree Inference

AbstractMultilocus genomic datasets can be used to infer a rich set of information about the evolutionary history of a lineage, including gene trees, species trees, and phylogenetic networks. However, user-friendly tools to run such integrated analyses are lacking, and workflows often require tedious reformatting and handling time to shepherd data through a series of individual programs. Here, we present a tool written in Python—TREEasy—that performs automated sequence alignment (with MAFFT), gene tree inference (with IQ-Tree), species inference from concatenated data (with IQ-Tree), species tree inference from gene trees (with ASTRAL, MP-EST, and STELLS2), and phylogenetic network inference (with SNaQ and PhyloNet). The tool only requires FASTA files and nine parameters as inputs. The Tool can be run as command line or through a Graphical User Interface (GUI). As examples, we reproduced a recent analysis of staghorn coral evolution, and performed a new analysis on the evolution of the WGD clade of yeast. The latter revealed novel inferences that were not identified by previous analyses. TREEasy represents a reliable and simple tool to accelerate research in systematic biology (https://github.com/MaoYafei/TREEasy).

Download Full-text

A phylotranscriptome study using silica gel-dried leaf tissues produces an updated robust phylogeny of Ranunculaceae

10.1101/2021.07.29.454256 ◽

2021 ◽

Author(s):

Jian He ◽

Rudan Lyu ◽

Yike Luo ◽

Jiamin Xiao ◽

Lei Xie ◽

...

Keyword(s):

Silica Gel ◽

Incomplete Lineage Sorting ◽

Rna Extraction ◽

Phylogenetic Reconstruction ◽

Phylogenetic Network ◽

Molecular Dating ◽

Limiting Factor ◽

Gene Trees ◽

Lineage Sorting ◽

Leaf Tissues

The utility of transcriptome data in plant phylogenetics has gained popularity in recent years. However, because RNA degrades much more easily than DNA, the logistics of obtaining fresh tissues has become a major limiting factor for widely applying this method. Here, we used Ranunculaceae to test whether silica-dried plant tissues could be used for RNA extraction and subsequent phylogenomic studies. We sequenced 27 transcriptomes, 21 from silica gel-dried (SD-samples) and six from liquid nitrogen-preserved (LN-samples) leaf tissues, and downloaded 27 additional transcriptomes from GenBank. Our results showed that although the LN-samples produced slightly better reads than the SD-samples, there were no significant differences in RNA quality and quantity, assembled contig lengths and numbers, and BUSCO comparisons between two treatments. Using this data, we conducted phylogenomic analyses, including concatenated- and coalescent-based phylogenetic reconstruction, molecular dating, coalescent simulation, phylogenetic network estimation, and whole genome duplication (WGD) inference. The resulting phylogeny was consistent with previous studies with higher resolution and statistical support. The 11 core Ranunculaceae tribes grouped into two chromosome type clades (T- and R-types), with high support. Discordance among gene trees is likely due to hybridization and introgression, ancient genetic polymorphism and incomplete lineage sorting. Our results strongly support one ancient hybridization event within the R-type clade and three WGD events in Ranunculales. Evolution of the three Ranunculaceae chromosome types is likely not directly related to WGD events. By clearly resolving the Ranunculaceae phylogeny, we demonstrated that SD-samples can be used for RNA-seq and phylotranscriptomic studies of angiosperms.

Download Full-text

Inference of Species Phylogenies from Bi-allelic Markers Using Pseudo-likelihood

10.1101/289207 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jiafan Zhu ◽

Luay Nakhleh

Keyword(s):

Network Inference ◽

Sequence Data ◽

Phylogenetic Network ◽

Simulated Data ◽

Biological Data ◽

Phylogenetic Networks ◽

Gene Trees ◽

Multispecies Coalescent ◽

Pseudo Likelihood ◽

Computational Bottleneck

AbstractMotivationPhylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g., single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method’s applicability.ResultsIn this paper, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylo-genetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger data sets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss.AvailabilityThe methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet)[email protected], [email protected]

Download Full-text

Gene trees: A powerful tool for exploring the evolutionary biology of species and speciation

Plant Species Biology ◽

10.1046/j.1442-1984.2000.00041.x ◽

2000 ◽

Vol 15 (3) ◽

pp. 211-222 ◽

Cited By ~ 21

Author(s):

Alan R. Templeton ◽

Stephanie D. Maskas ◽

Mitchell B. Cruzan

Keyword(s):

Evolutionary Biology ◽

Gene Trees

Download Full-text