scholarly journals Reconstructing k-Reticulated Phylogenetic Network from a Set of Gene Trees

Author(s):  
Hoa Vu ◽  
Francis Chin ◽  
W. K. Hon ◽  
Henry Leung ◽  
K. Sadakane ◽  
...  
2019 ◽  
Author(s):  
Daniel H. Huson ◽  
Benjamin Albrecht ◽  
Sascha Patz ◽  
Mike Steel

AbstractIn phylogenetics, a set of gene trees is often summarized by a consensus tree, such as the majority consensus, which is based on the set of all splits that are present in more than 50% of the input trees. A “consensus network” is obtained by lowering the threshold and considering all splits that are contained in 10% of the trees, say, and then computing the corresponding splits network. By construction and in practice, a consensus network usually shows the majority tree, extended by a number of rectangles that represent local rearrangements around internal nodes of the consensus tree. This may lead to the false conclusion that the input trees do not differ in a significant way because “even a phylogenetic network” does not display any large discrepancies. To harness the full potential of a phylogenetic network, we introduce the new concept of an anti-consensus network that aims at representing the largest interesting discrepancies found in a set of gene trees. We provide an efficient algorithm for computing an anti-consensus and illustrate its application using a set of gene trees from species and strains in the genus Kosakonia.


2021 ◽  
Vol 22 (21) ◽  
pp. 11393
Author(s):  
Marcin Górniak ◽  
Dariusz L. Szlachetko ◽  
Natalia Olędrzyńska ◽  
Aleksandra M. Naczk ◽  
Agata Mieszkowska ◽  
...  

The phylogeny of the genus Paphiopedilum based on the plastome is consistent with morphological analysis. However, to date, none of the analyzed nuclear markers has confirmed this. Topology incongruence among the trees of different nuclear markers concerns entire sections of the subgenus Paphiopedilum. The low-copy nuclear protein-coding gene PHYC was obtained for 22 species representing all sections and subgenera of Paphiopedilum. The nuclear-based phylogeny is supported by morphological characteristics and plastid data analysis. We assumed that an incongruence in nuclear gene trees is caused by ancestral homoploid hybridization. We present a model for inferring the phylogeny of the species despite the incongruence of the different tree topologies. Our analysis, based on six low-copy nuclear genes, is congruent with plastome phylogeny and has been confirmed by phylogenetic network analysis.


2016 ◽  
Author(s):  
Dingqiao Wen ◽  
Luay Nakhleh

AbstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set. [Multispecies network coalescent; reticulation; incomplete lineage sorting; phylogenetic network; Bayesian inference; RJMCMC.]


2021 ◽  
Author(s):  
Diego F. Morales-Briones ◽  
Nan Lin ◽  
Eileen Y. Huang ◽  
Dena L. Grossenbacher ◽  
James M. Sobel ◽  
...  

Premise of the study: Phylogenomic datasets using genomes and transcriptomes provide rich opportunities beyond resolving bifurcating phylogenetic relationships. Monkeyflower (Phrymaceae) is a model system for evolutionary ecology. However, it lacks a well-supported phylogeny for a stable taxonomy and for macroevolutionary comparisons. Methods: We sampled 24 genomes and transcriptomes in Phrymaceae and closely related families, including eight newly sequenced transcriptomes. We reconstructed the phylogeny using IQ-TREE and ASTRAL, evaluated gene tree discordance using PhyParts, Quartet Sampling, and cloudogram, and carried out phylogenetic network analyses using PhyloNet and HyDe. We searched for whole genome duplication (WGD) events using chromosome numbers, synonymous distance, and gene duplication events. Key results: Most gene trees support the monophyly of Phrymaceae and each of its tribes. Most gene trees also support the tribe Mimuleae being sister to Phrymeae + Diplaceae + Leucocarpeae, with extensive gene tree discordance among the latter three. Despite the discordance, polyphyly of Mimulus s.l. is strongly supported, and no particular reticulation event among the Phrymaceae tribes is well supported. Reticulation likely occurred among Erythranthe bicolor and close relatives. No ancient WGD event was detected in Phrymaceae. Instead, small-scale duplications are among potential drivers of macroevolutionary diversification of Phrymaceae. Conclusions: We show that analysis of reticulate evolution is sensitive to taxon sampling and methods used. We also demonstrate that genome-scale data do not always fully "resolve" phylogenetic relationships. They present rich opportunities to investigate reticulate evolution, and gene and genome evolution involved in lineage diversification and adaptation.


2020 ◽  
Author(s):  
Zhi Yan ◽  
Zhen Cao ◽  
Yushu Liu ◽  
Luay Nakhleh

AbstractPhylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this paper, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene trees as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. The method could also infer trees, thus potentially distinguishing between auto- and allo-polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating given phylogenetic networks are implemented and publicly available in the PhyloNet software package.


2017 ◽  
Author(s):  
Chi Zhang ◽  
Huw A. Ogilvie ◽  
Alexei J. Drummond ◽  
Tanja Stadler

AbstractReticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent (MSNC) prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large datasets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.


2019 ◽  
Author(s):  
Yafei Mao ◽  
Siqing Hou ◽  
Evan P. Economo

AbstractMultilocus genomic datasets can be used to infer a rich set of information about the evolutionary history of a lineage, including gene trees, species trees, and phylogenetic networks. However, user-friendly tools to run such integrated analyses are lacking, and workflows often require tedious reformatting and handling time to shepherd data through a series of individual programs. Here, we present a tool written in Python—TREEasy—that performs automated sequence alignment (with MAFFT), gene tree inference (with IQ-Tree), species inference from concatenated data (with IQ-Tree), species tree inference from gene trees (with ASTRAL, MP-EST, and STELLS2), and phylogenetic network inference (with SNaQ and PhyloNet). The tool only requires FASTA files and nine parameters as inputs. The Tool can be run as command line or through a Graphical User Interface (GUI). As examples, we reproduced a recent analysis of staghorn coral evolution, and performed a new analysis on the evolution of the WGD clade of yeast. The latter revealed novel inferences that were not identified by previous analyses. TREEasy represents a reliable and simple tool to accelerate research in systematic biology (https://github.com/MaoYafei/TREEasy).


2021 ◽  
Author(s):  
Jian He ◽  
Rudan Lyu ◽  
Yike Luo ◽  
Jiamin Xiao ◽  
Lei Xie ◽  
...  

The utility of transcriptome data in plant phylogenetics has gained popularity in recent years. However, because RNA degrades much more easily than DNA, the logistics of obtaining fresh tissues has become a major limiting factor for widely applying this method. Here, we used Ranunculaceae to test whether silica-dried plant tissues could be used for RNA extraction and subsequent phylogenomic studies. We sequenced 27 transcriptomes, 21 from silica gel-dried (SD-samples) and six from liquid nitrogen-preserved (LN-samples) leaf tissues, and downloaded 27 additional transcriptomes from GenBank. Our results showed that although the LN-samples produced slightly better reads than the SD-samples, there were no significant differences in RNA quality and quantity, assembled contig lengths and numbers, and BUSCO comparisons between two treatments. Using this data, we conducted phylogenomic analyses, including concatenated- and coalescent-based phylogenetic reconstruction, molecular dating, coalescent simulation, phylogenetic network estimation, and whole genome duplication (WGD) inference. The resulting phylogeny was consistent with previous studies with higher resolution and statistical support. The 11 core Ranunculaceae tribes grouped into two chromosome type clades (T- and R-types), with high support. Discordance among gene trees is likely due to hybridization and introgression, ancient genetic polymorphism and incomplete lineage sorting. Our results strongly support one ancient hybridization event within the R-type clade and three WGD events in Ranunculales. Evolution of the three Ranunculaceae chromosome types is likely not directly related to WGD events. By clearly resolving the Ranunculaceae phylogeny, we demonstrated that SD-samples can be used for RNA-seq and phylotranscriptomic studies of angiosperms.


2018 ◽  
Author(s):  
Jiafan Zhu ◽  
Luay Nakhleh

AbstractMotivationPhylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g., single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method’s applicability.ResultsIn this paper, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylo-genetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger data sets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss.AvailabilityThe methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet)[email protected], [email protected]


2000 ◽  
Vol 15 (3) ◽  
pp. 211-222 ◽  
Author(s):  
Alan R. Templeton ◽  
Stephanie D. Maskas ◽  
Mitchell B. Cruzan

Sign in / Sign up

Export Citation Format

Share Document