mathematical phylogenetics Latest Research Papers

Abstract Background The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. The special case of a common leaf set L is known to be solvable in linear time. Existing approaches refine one input tree using information of the others and then test whether the results are isomorphic. Results An O(k|L|) algorithm, , for constructing the common refinement T of k input trees with a common leaf set L is proposed that explicitly computes the parent function of T in a bottom-up approach. Conclusion is simpler to implement than other asymptotically optimal algorithms for the problem and outperforms the alternatives in empirical comparisons. Availability An implementation of in Python is freely available at https://github.com/david-schaller/tralda.

Download Full-text

Heuristic algorithms for best match graph editing

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00196-3 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

David Schaller ◽

Manuela Geiß ◽

Marc Hellmuth ◽

Peter F. Stadler

Keyword(s):

Heuristic Algorithms ◽

Sequence Data ◽

Similarity Measures ◽

Set Partitioning ◽

Attractive Alternative ◽

Biological Sequence ◽

Detection Algorithms ◽

Empirical Estimates ◽

Mathematical Phylogenetics ◽

Multiple Species

Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

Download Full-text

Extremal Values of the Sackin Tree Balance Index

Annals of Combinatorics ◽

10.1007/s00026-021-00539-2 ◽

2021 ◽

Author(s):

Mareike Fischer

Keyword(s):

Phylogenetic Trees ◽

Theoretical Computer Science ◽

Search Trees ◽

Formal Proofs ◽

Birth Process ◽

Ordered Trees ◽

Research Areas ◽

New Findings ◽

Mathematical Phylogenetics ◽

Extremal Values

AbstractTree balance plays an important role in different research areas like theoretical computer science and mathematical phylogenetics. For example, it has long been known that under the Yule model, a pure birth process, imbalanced trees are more likely than balanced ones. Also, concerning ordered search trees, more balanced ones allow for more efficient data structuring than imbalanced ones. Therefore, different methods to measure the balance of trees were introduced. The Sackin index is one of the most frequently used measures for this purpose. In many contexts, statements about the minimal and maximal values of this index have been discussed, but formal proofs have only been provided for some of them, and only in the context of ordered binary (search) trees, not for general rooted trees. Moreover, while the number of trees with maximal Sackin index as well as the number of trees with minimal Sackin index when the number of leaves is a power of 2 are relatively easy to understand, the number of trees with minimal Sackin index for all other numbers of leaves has been completely unknown. In this manuscript, we extend the findings on trees with minimal and maximal Sackin indices from the literature on ordered trees and subsequently use our results to provide formulas to explicitly calculate the numbers of such trees. We also extend previous studies by analyzing the case when the underlying trees need not be binary. Finally, we use our results to contribute both to the phylogenetic as well as the computer scientific literature using the new findings on Sackin minimal and maximal trees to derive formulas to calculate the number of both minimal and maximal phylogenetic trees as well as minimal and maximal ordered trees both in the binary and non-binary settings. All our results have been implemented in the Mathematica package SackinMinimizer, which has been made publicly available.

Download Full-text

Arc-Completion of 2-Colored Best Match Graphs to Binary-Explainable Best Match Graphs

Algorithms ◽

10.3390/a14040110 ◽

2021 ◽

Vol 14 (4) ◽

pp. 110

Author(s):

David Schaller ◽

Manuela Geiß ◽

Marc Hellmuth ◽

Peter F. Stadler

Keyword(s):

Phylogenetic Tree ◽

Polynomial Time ◽

Binary Tree ◽

A Priori ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Minimum Cardinality ◽

Mathematical Phylogenetics ◽

Special Case

Best match graphs (BMGs) are vertex-colored digraphs that naturally arise in mathematical phylogenetics to formalize the notion of evolutionary closest genes w.r.t. an a priori unknown phylogenetic tree. BMGs are explained by unique least resolved trees. We prove that the property of a rooted, leaf-colored tree to be least resolved for some BMG is preserved by the contraction of inner edges. For the special case of two-colored BMGs, this leads to a characterization of the least resolved trees (LRTs) of binary-explainable trees and a simple, polynomial-time algorithm for the minimum cardinality completion of the arc set of a BMG to reach a BMG that can be explained by a binary tree.

Download Full-text

Tanglegrams: A Reduction Tool for Mathematical Phylogenetics

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2016.2613040 ◽

2018 ◽

Vol 15 (1) ◽

pp. 343-349 ◽

Cited By ~ 3

Author(s):

Frederick A. Matsen ◽

Sara C. Billey ◽

Arnold Kas ◽

Matjaz Konvalinka

Keyword(s):

Mathematical Phylogenetics

Download Full-text

Phylogenomics with paralogs

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1412770112 ◽

2015 ◽

Vol 112 (7) ◽

pp. 2058-2063 ◽

Cited By ~ 44

Author(s):

Marc Hellmuth ◽

Nicolas Wieseke ◽

Marcus Lechner ◽

Hans-Peter Lenhof ◽

Martin Middendorf ◽

...

Keyword(s):

Phylogenetic Trees ◽

Sequence Data ◽

Gene Families ◽

Data Sets ◽

Gene Trees ◽

Species Trees ◽

Individual Gene ◽

Genome Wide Data ◽

Degree Of Certainty ◽

Mathematical Phylogenetics

Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.

Download Full-text

mathematical phylogenetics
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set

Heuristic algorithms for best match graph editing

Extremal Values of the Sackin Tree Balance Index

Arc-Completion of 2-Colored Best Match Graphs to Binary-Explainable Best Match Graphs

Tanglegrams: A Reduction Tool for Mathematical Phylogenetics

Phylogenomics with paralogs

Export Citation Format

mathematical phylogeneticsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set

Heuristic algorithms for best match graph editing

Extremal Values of the Sackin Tree Balance Index

Arc-Completion of 2-Colored Best Match Graphs to Binary-Explainable Best Match Graphs

Tanglegrams: A Reduction Tool for Mathematical Phylogenetics

Phylogenomics with paralogs

mathematical phylogenetics
Recently Published Documents