Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS.

Download Full-text

Using Robinson-Foulds Supertrees in Divide-and-Conquer Phylogeny Estimation

10.21203/rs.3.rs-174421/v1 ◽

2021 ◽

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Greedy Heuristic ◽

Mcmc Methods ◽

Np Hard ◽

Phylogeny Estimation ◽

Source Form

Abstract One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a ``supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP -hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS

Download Full-text

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00189-2 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Mcmc Methods ◽

Supertree Method ◽

Phylogeny Estimation ◽

Source Form ◽

Life On Earth

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS.

Download Full-text

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

10.1101/469130 ◽

2018 ◽

Cited By ~ 3

Author(s):

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Distance Matrix ◽

Divide And Conquer ◽

Estimation Methods ◽

Supertree Method ◽

Base Method ◽

Time Extension ◽

Phylogeny Estimation ◽

Computational Resources

AbstractBackgroundDivide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.ResultsIn this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and “concatenation” using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.ConclusionsTheoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

Download Full-text

Definable inapproximability: new challenges for duplicator

Journal of Logic and Computation ◽

10.1093/logcom/exz022 ◽

2019 ◽

Vol 29 (8) ◽

pp. 1185-1210 ◽

Cited By ~ 1

Author(s):

Albert Atserias ◽

Anuj Dawar

Keyword(s):

Approximate Solution ◽

Lower Bounds ◽

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Constant Factor ◽

Point Of View ◽

Hardness Of Approximation ◽

Fixed Constant ◽

Algorithmic Techniques

Abstract We consider the hardness of approximation of optimization problems from the point of view of definability. For many $\textrm{NP}$-hard optimization problems it is known that, unless $\textrm{P} = \textrm{NP} $, no polynomial-time algorithm can give an approximate solution guaranteed to be within a fixed constant factor of the optimum. We show, in several such instances and without any complexity theoretic assumption, that no algorithm that is expressible in fixed-point logic with counting (FPC) can compute an approximate solution. Since important algorithmic techniques for approximation algorithms (such as linear or semidefinite programming) are expressible in FPC, this yields lower bounds on what can be achieved by such methods. The results are established by showing lower bounds on the number of variables required in first-order logic with counting to separate instances with a high optimum from those with a low optimum for fixed-size instances.

Download Full-text

An overview on polynomial approximation of NP-hard problems

Yugoslav journal of operations research ◽

10.2298/yjor0901003p ◽

2009 ◽

Vol 19 (1) ◽

pp. 3-40 ◽

Cited By ~ 6

Author(s):

Vangelis Paschos

Keyword(s):

Approximation Algorithms ◽

Polynomial Approximation ◽

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Computational Time ◽

Np Hard ◽

Hard Problems ◽

Feasible Solutions ◽

Np Hard Problems

The fact that polynomial time algorithm is very unlikely to be devised for an optimal solving of the NP-hard problems strongly motivates both the researchers and the practitioners to try to solve such problems heuristically, by making a trade-off between computational time and solution's quality. In other words, heuristic computation consists of trying to find not the best solution but one solution which is 'close to' the optimal one in reasonable time. Among the classes of heuristic methods for NP-hard problems, the polynomial approximation algorithms aim at solving a given NP-hard problem in poly-nomial time by computing feasible solutions that are, under some predefined criterion, as near to the optimal ones as possible. The polynomial approximation theory deals with the study of such algorithms. This survey first presents and analyzes time approximation algorithms for some classical examples of NP-hard problems. Secondly, it shows how classical notions and tools of complexity theory, such as polynomial reductions, can be matched with polynomial approximation in order to devise structural results for NP-hard optimization problems. Finally, it presents a quick description of what is commonly called inapproximability results. Such results provide limits on the approximability of the problems tackled.

Download Full-text

The Power of Human–Algorithm Collaboration in Solving Combinatorial Optimization Problems

Algorithms ◽

10.3390/a14090253 ◽

2021 ◽

Vol 14 (9) ◽

pp. 253

Author(s):

Tapani Toivonen ◽

Markku Tukiainen

Keyword(s):

Combinatorial Optimization ◽

Optimization Problems ◽

Maximum Clique ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Combinatorial Problems ◽

Combinatorial Optimization Problems ◽

Multiplicative Factor ◽

Intractable Problems ◽

Polynomial Factor

Many combinatorial optimization problems are often considered intractable to solve exactly or by approximation. An example of such a problem is maximum clique, which—under standard assumptions in complexity theory—cannot be solved in sub-exponential time or be approximated within the polynomial factor efficiently. However, we show that if a polynomial time algorithm can query informative Gaussian priors from an expert poly(n) times, then a class of combinatorial optimization problems can be solved efficiently up to a multiplicative factor ϵ, where ϵ is arbitrary constant. In this paper, we present proof of our claims and show numerical results to support them. Our methods can cast new light on how to approach optimization problems in domains where even the approximation of the problem is not feasible. Furthermore, the results can help researchers to understand the structures of these problems (or whether these problems have any structure at all!). While the proposed methods can be used to approximate combinatorial problems in NPO, we note that the scope of the problems solvable might well include problems that are provable intractable (problems in EXPTIME).

Download Full-text

Determining the Hausdorff Distance Between Trees in Polynomial Time

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.6952 ◽

2021 ◽

Vol vol. 23, no. 3 (Discrete Algorithms) ◽

Author(s):

Aleksander Kelenc

Keyword(s):

Polynomial Time ◽

Hausdorff Distance ◽

Special Kind ◽

Graph Algorithm ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Divide And Conquer ◽

Bipartite Matching ◽

The Common ◽

Maximum Bipartite Matching

The Hausdorff distance is a relatively new measure of similarity of graphs. The notion of the Hausdorff distance considers a special kind of a common subgraph of the compared graphs and depends on the structural properties outside of the common subgraph. There was no known efficient algorithm for the problem of determining the Hausdorff distance between two trees, and in this paper we present a polynomial-time algorithm for it. The algorithm is recursive and it utilizes the divide and conquer technique. As a subtask it also uses the procedure that is based on the well known graph algorithm of finding the maximum bipartite matching.

Download Full-text

Learning Importance of Preferences

10.29007/v68w ◽

2018 ◽

Author(s):

Ying Zhu ◽

Mirek Truszczynski

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Scoring Rules ◽

Np Hard ◽

Individual Preferences

We study the problem of learning the importance of preferences in preference profiles in two important cases: when individual preferences are aggregated by the ranked Pareto rule, and when they are aggregated by positional scoring rules. For the ranked Pareto rule, we provide a polynomial-time algorithm that finds a ranking of preferences such that the ranked profile correctly decides all the examples, whenever such a ranking exists. We also show that the problem to learn a ranking maximizing the number of correctly decided examples (also under the ranked Pareto rule) is NP-hard. We obtain similar results for the case of weighted profiles when positional scoring rules are used for aggregation.

Download Full-text

Metric Dimension Parameterized By Treewidth

Algorithmica ◽

10.1007/s00453-021-00808-9 ◽

2021 ◽

Author(s):

Édouard Bonnet ◽

Nidhi Purohit

Keyword(s):

Computable Function ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Minimum Size ◽

Metric Dimension ◽

Resolving Set ◽

Input Graph ◽

Outerplanar Graphs ◽

Bounded Treewidth ◽

Fpt Algorithm

AbstractA resolving set S of a graph G is a subset of its vertices such that no two vertices of G have the same distance vector to S. The Metric Dimension problem asks for a resolving set of minimum size, and in its decision form, a resolving set of size at most some specified integer. This problem is NP-complete, and remains so in very restricted classes of graphs. It is also W[2]-complete with respect to the size of the solution. Metric Dimension has proven elusive on graphs of bounded treewidth. On the algorithmic side, a polynomial time algorithm is known for trees, and even for outerplanar graphs, but the general case of treewidth at most two is open. On the complexity side, no parameterized hardness is known. This has led several papers on the topic to ask for the parameterized complexity of Metric Dimension with respect to treewidth. We provide a first answer to the question. We show that Metric Dimension parameterized by the treewidth of the input graph is W[1]-hard. More refinedly we prove that, unless the Exponential Time Hypothesis fails, there is no algorithm solving Metric Dimension in time $$f(\text {pw})n^{o(\text {pw})}$$ f ( pw ) n o ( pw ) on n-vertex graphs of constant degree, with $$\text {pw}$$ pw the pathwidth of the input graph, and f any computable function. This is in stark contrast with an FPT algorithm of Belmonte et al. (SIAM J Discrete Math 31(2):1217–1243, 2017) with respect to the combined parameter $$\text {tl}+\Delta$$ tl + Δ , where $$\text {tl}$$ tl is the tree-length and $$\Delta$$ Δ the maximum-degree of the input graph.

Download Full-text

A Polynomial-Time Algorithm for Minimizing the Deep Coalescence Cost for Level-1 Species Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3105922 ◽

2021 ◽

pp. 1-1

Author(s):

Matthew Lemay ◽

Ran Libeskind-Hadas ◽

Yi-Chieh Wu

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Deep Coalescence ◽

Level 1

Download Full-text