mathematical phylogenetics
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 0)

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
David Schaller ◽  
Marc Hellmuth ◽  
Peter F. Stadler

Abstract Background The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. The special case of a common leaf set L is known to be solvable in linear time. Existing approaches refine one input tree using information of the others and then test whether the results are isomorphic. Results An O(k|L|) algorithm, , for constructing the common refinement T of k input trees with a common leaf set L is proposed that explicitly computes the parent function of T in a bottom-up approach. Conclusion is simpler to implement than other asymptotically optimal algorithms for the problem and outperforms the alternatives in empirical comparisons. Availability An implementation of in Python is freely available at https://github.com/david-schaller/tralda.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
David Schaller ◽  
Manuela Geiß ◽  
Marc Hellmuth ◽  
Peter F. Stadler

Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.


Author(s):  
Mareike Fischer

AbstractTree balance plays an important role in different research areas like theoretical computer science and mathematical phylogenetics. For example, it has long been known that under the Yule model, a pure birth process, imbalanced trees are more likely than balanced ones. Also, concerning ordered search trees, more balanced ones allow for more efficient data structuring than imbalanced ones. Therefore, different methods to measure the balance of trees were introduced. The Sackin index is one of the most frequently used measures for this purpose. In many contexts, statements about the minimal and maximal values of this index have been discussed, but formal proofs have only been provided for some of them, and only in the context of ordered binary (search) trees, not for general rooted trees. Moreover, while the number of trees with maximal Sackin index as well as the number of trees with minimal Sackin index when the number of leaves is a power of 2 are relatively easy to understand, the number of trees with minimal Sackin index for all other numbers of leaves has been completely unknown. In this manuscript, we extend the findings on trees with minimal and maximal Sackin indices from the literature on ordered trees and subsequently use our results to provide formulas to explicitly calculate the numbers of such trees. We also extend previous studies by analyzing the case when the underlying trees need not be binary. Finally, we use our results to contribute both to the phylogenetic as well as the computer scientific literature using the new findings on Sackin minimal and maximal trees to derive formulas to calculate the number of both minimal and maximal phylogenetic trees as well as minimal and maximal ordered trees both in the binary and non-binary settings. All our results have been implemented in the Mathematica package SackinMinimizer, which has been made publicly available.


Algorithms ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 110
Author(s):  
David Schaller ◽  
Manuela Geiß ◽  
Marc Hellmuth ◽  
Peter F. Stadler

Best match graphs (BMGs) are vertex-colored digraphs that naturally arise in mathematical phylogenetics to formalize the notion of evolutionary closest genes w.r.t. an a priori unknown phylogenetic tree. BMGs are explained by unique least resolved trees. We prove that the property of a rooted, leaf-colored tree to be least resolved for some BMG is preserved by the contraction of inner edges. For the special case of two-colored BMGs, this leads to a characterization of the least resolved trees (LRTs) of binary-explainable trees and a simple, polynomial-time algorithm for the minimum cardinality completion of the arc set of a BMG to reach a BMG that can be explained by a binary tree.


Author(s):  
Frederick A. Matsen ◽  
Sara C. Billey ◽  
Arnold Kas ◽  
Matjaz Konvalinka

2015 ◽  
Vol 112 (7) ◽  
pp. 2058-2063 ◽  
Author(s):  
Marc Hellmuth ◽  
Nicolas Wieseke ◽  
Marcus Lechner ◽  
Hans-Peter Lenhof ◽  
Martin Middendorf ◽  
...  

Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.


Sign in / Sign up

Export Citation Format

Share Document