supertree method
Recently Published Documents

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS.

Download Full-text

Phylogenetic supertree reveals detailed evolution of SARS-CoV-2

Scientific Reports ◽

10.1038/s41598-020-79484-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Tingting Li ◽

Dongxia Liu ◽

Yadi Yang ◽

Jiali Guo ◽

Yujie Feng ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Phylogenetic Tree ◽

Virus Disease ◽

Evolutionary Relationship ◽

Critical Issue ◽

Last Common Ancestor ◽

Phylogenetic Tree Analysis ◽

Origin And Evolution ◽

Supertree Method ◽

The Matrix

AbstractCorona Virus Disease 2019 (COVID-19) caused by the emerged coronavirus SARS-CoV-2 is spreading globally. The origin of SARS-Cov-2 and its evolutionary relationship is still ambiguous. Several reports attempted to figure out this critical issue by genome-based phylogenetic analysis, yet limited progress was obtained, principally owing to the disability of these methods to reasonably integrate phylogenetic information from all genes of SARS-CoV-2. Supertree method based on multiple trees can produce the overall reasonable phylogenetic tree. However, the supertree method has been barely used for phylogenetic analysis of viruses. Here we applied the matrix representation with parsimony (MRP) pseudo-sequence supertree analysis to study the origin and evolution of SARS-CoV-2. Compared with other phylogenetic analysis methods, the supertree method showed more resolution power for phylogenetic analysis of coronaviruses. In particular, the MRP pseudo-sequence supertree analysis firmly disputes bat coronavirus RaTG13 be the last common ancestor of SARS-CoV-2, which was implied by other phylogenetic tree analysis based on viral genome sequences. Furthermore, the discovery of evolution and mutation in SARS-CoV-2 was achieved by MRP pseudo-sequence supertree analysis. Taken together, the MRP pseudo-sequence supertree provided more information on the SARS-CoV-2 evolution inference relative to the normal phylogenetic tree based on full-length genomic sequences.

Download Full-text

Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

10.1101/2020.05.16.099895 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Greedy Heuristic ◽

Mcmc Methods ◽

Supertree Method ◽

Phylogeny Estimation ◽

Source Form

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS.

Download Full-text

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

10.1101/469130 ◽

2018 ◽

Cited By ~ 3

Author(s):

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Distance Matrix ◽

Divide And Conquer ◽

Estimation Methods ◽

Supertree Method ◽

Base Method ◽

Time Extension ◽

Phylogeny Estimation ◽

Computational Resources

AbstractBackgroundDivide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.ResultsIn this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and “concatenation” using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.ConclusionsTheoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

Download Full-text

Integrative modeling of gene and genome evolution roots the archaeal tree of life

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1618463114 ◽

2017 ◽

Vol 114 (23) ◽

pp. E4602-E4611 ◽

Cited By ~ 104

Author(s):

Tom A. Williams ◽

Gergely J. Szöllősi ◽

Anja Spang ◽

Peter G. Foster ◽

Sarah E. Heaps ◽

...

Keyword(s):

Genome Evolution ◽

Single Gene ◽

Rooted Tree ◽

Gene Families ◽

Sister Group ◽

Work Place ◽

Gene Duplications ◽

Gene Trees ◽

Supertree Method ◽

Horizontal Transfers

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood–Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.

Download Full-text

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

PeerJ ◽

10.7717/peerj.3058 ◽

2017 ◽

Vol 5 ◽

pp. e3058 ◽

Cited By ~ 29

Author(s):

Benjamin D. Redelings ◽

Mark T. Holder

Keyword(s):

Construction Method ◽

Tree Of Life ◽

Free Software ◽

Software Pipeline ◽

Rapid Estimation ◽

Life Project ◽

Taxonomic Information ◽

Supertree Method

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

Download Full-text

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

10.7287/peerj.preprints.2538v1 ◽

2016 ◽

Author(s):

Benjamin D Redelings ◽

Mark T Holder

Keyword(s):

Construction Method ◽

Tree Of Life ◽

Free Software ◽

Software Pipeline ◽

Rapid Estimation ◽

Life Project ◽

Taxonomic Information ◽

Supertree Method

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project's "synthetic tree" starting at version 5. This software pipeline is called "propinquity". It relies heavily on "otcetera" - a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

Download Full-text

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

10.7287/peerj.preprints.2538 ◽

2016 ◽

Author(s):

Benjamin D Redelings ◽

Mark T Holder

Keyword(s):

Construction Method ◽

Tree Of Life ◽

Free Software ◽

Software Pipeline ◽

Rapid Estimation ◽

Life Project ◽

Taxonomic Information ◽

Supertree Method

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project's "synthetic tree" starting at version 5. This software pipeline is called "propinquity". It relies heavily on "otcetera" - a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

Download Full-text

Horizontal gene flow from Eubacteria to Archaebacteria and what it means for our understanding of eukaryogenesis

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2014.0337 ◽

2015 ◽

Vol 370 (1678) ◽

pp. 20140337 ◽

Cited By ~ 16

Author(s):

Wasiu A. Akanni ◽

Karen Siu-Ting ◽

Christopher J. Creevey ◽

James O. McInerney ◽

Mark Wilkinson ◽

...

Keyword(s):

Large Scale ◽

Eukaryotic Cell ◽

Current Evidence ◽

Evolutionary Transitions ◽

Genomic Dataset ◽

Large Numbers ◽

Supertree Method ◽

Eukaryotic Origin ◽

History Of ◽

Host Nucleus

The origin of the eukaryotic cell is considered one of the major evolutionary transitions in the history of life. Current evidence strongly supports a scenario of eukaryotic origin in which two prokaryotes, an archaebacterial host and an α -proteobacterium (the free-living ancestor of the mitochondrion), entered a stable symbiotic relationship. The establishment of this relationship was associated with a process of chimerization, whereby a large number of genes from the α-proteobacterial symbiont were transferred to the host nucleus. A general framework allowing the conceptualization of eukaryogenesis from a genomic perspective has long been lacking. Recent studies suggest that the origins of several archaebacterial phyla were coincident with massive imports of eubacterial genes. Although this does not indicate that these phyla originated through the same process that led to the origin of Eukaryota, it suggests that Archaebacteria might have had a general propensity to integrate into their genomes large amounts of eubacterial DNA. We suggest that this propensity provides a framework in which eukaryogenesis can be understood and studied in the light of archaebacterial ecology. We applied a recently developed supertree method to a genomic dataset composed of 392 eubacterial and 51 archaebacterial genera to test whether large numbers of genes flowing from Eubacteria are indeed coincident with the origin of major archaebacterial clades. In addition, we identified two potential large-scale transfers of uncertain directionality at the base of the archaebacterial tree. Our results are consistent with previous findings and seem to indicate that eubacterial gene imports (particularly from δ - Proteobacteria, Clostridia and Actinobacteria) were an important factor in archaebacterial history. Archaebacteria seem to have long relied on Eubacteria as a source of genetic diversity, and while the precise mechanism that allowed these imports is unknown, we suggest that our results support the view that processes comparable to those through which eukaryotes emerged might have been common in archaebacterial history.

Download Full-text

Polynomial Supertree Methods Revisited

Advances in Bioinformatics ◽

10.1155/2011/524182 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 6

Author(s):

Malte Brinkmeyer ◽

Thasso Griebel ◽

Sebastian Böcker

Keyword(s):

Simulation Study ◽

Phylogenetic Trees ◽

Optimization Problem ◽

Matrix Representation ◽

Distance Matrix ◽

Extensive Simulation ◽

Supertree Method ◽

Pros And Cons ◽

To Come ◽

Supertree Methods

Supertree methods allow to reconstruct large phylogenetic trees by combining smaller trees with overlapping leaf sets into one, more comprehensive supertree. The most commonly used supertree method, matrix representation with parsimony (MRP), produces accurate supertrees but is rather slow due to the underlying hard optimization problem. In this paper, we present an extensive simulation study comparing the performance of MRP and the polynomial supertree methods MinCut Supertree, Modified MinCut Supertree, Build-with-distances, PhySIC, PhySIC_IST, and super distance matrix. We consider both quality and resolution of the reconstructed supertrees. Our findings illustrate the tradeoff between accuracy and running time in supertree construction, as well as the pros and cons of voting- and veto-based supertree approaches. Based on our results, we make some general suggestions for supertree methods yet to come.

Download Full-text

supertree methodRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

Phylogenetic supertree reveals detailed evolution of SARS-CoV-2

Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

Integrative modeling of gene and genome evolution roots the archaeal tree of life

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

Horizontal gene flow from Eubacteria to Archaebacteria and what it means for our understanding of eukaryogenesis

Polynomial Supertree Methods Revisited

supertree method
Recently Published Documents