perfect phylogeny
Recently Published Documents


TOTAL DOCUMENTS

92
(FIVE YEARS 10)

H-INDEX

16
(FIVE YEARS 1)

Author(s):  
Giulia Bernardini ◽  
Paola Bonizzoni ◽  
Paweł Gawrychowski

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i169-i176 ◽  
Author(s):  
Erfan Sadeqi Azer ◽  
Farid Rashidi Mehrabadi ◽  
Salem Malikić ◽  
Xuan Cindy Li ◽  
Osnat Bartok ◽  
...  

Abstract Motivation Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. Results We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10–100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. Availability and implementation https://github.com/algo-cancer/PhISCS-BnB. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 34 (02) ◽  
pp. 1544-1551
Author(s):  
Tuukka Korhonen ◽  
Matti J„ärvisalo

The reconstruction of the evolutionary tree of a set of species based on qualitative attributes is a central problem in phylogenetics. In the NP-hard perfect phylogeny problem the input is a set of taxa (species) and characters (attributes) on them, and the task is to find an evolutionary tree that describes the evolution of the taxa so that each character state evolves only once. However, in practical situations a perfect phylogeny rarely exists, motivating the maximum compatibility problem of finding the largest subset of characters admitting a perfect phylogeny. Various declarative approaches, based on applying integer programming (IP), answer set programming (ASP) and pseudo-Boolean optimization (PBO) solvers, have been proposed for maximum compatibility. In this work we develop a new hybrid approach to solving maximum compatibility for multi-state characters, making use of both declarative optimization techniques (specifically maximum satisfiability, MaxSAT) and an adaptation of the Bouchitt'e-Todinca approach to triangulation-based graph optimization problems. Empirically our approach outperforms in scalability the earlier proposed approaches w.r.t. various parameters underlying the problem.


Author(s):  
Erfan Sadeqi Azer ◽  
Mohammad Haghir Ebrahimabadi ◽  
Salem Malikić ◽  
Roni Khardon ◽  
S. Cenk Sahinalp

SummaryPrincipled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single-cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny - rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep-learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.


2019 ◽  
Author(s):  
Yufeng Wu

Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Yufeng Wu

AbstractCells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling-based and can be very slow for large data.In this paper, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets.AvailabilityThe program ScisTree is available for download at: https://github.com/yufengwudcs/[email protected]


2019 ◽  
Author(s):  
Charith Bhagya Karunarathna ◽  
Jinko Graham

AbstractBackgroundA perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype.ResultsWe present an R packageperfectphyloRto reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package.ConclusionTheperfectphyloRpackage should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.


Algorithms ◽  
2019 ◽  
Vol 12 (3) ◽  
pp. 53
Author(s):  
David Fernández-Baca ◽  
Lei Liu

We study two problems in computational phylogenetics. The first is tree compatibility. The input is a collection of phylogenetic trees over different partially-overlapping sets of species. The goal is to find a single phylogenetic tree that displays all the evolutionary relationships implied by . The second problem is incomplete directed perfect phylogeny (IDPP). The input is a data matrix describing a collection of species by a set of characters, where some of the information is missing. The question is whether there exists a way to fill in the missing information so that the resulting matrix can be explained by a phylogenetic tree satisfying certain conditions. We explain the connection between tree compatibility and IDPP and show that a recent tree compatibility algorithm is effectively a generalization of an earlier IDPP algorithm. Both algorithms rely heavily on maintaining the connected components of a graph under a sequence of edge and vertex deletions, for which they use the dynamic connectivity data structure of Holm et al., known as HDT. We present a computational study of algorithms for tree compatibility and IDPP. We show experimentally that substituting HDT by a much simpler data structure—essentially, a single-level version of HDT—improves the performance of both of these algorithm in practice. We give partial empirical and theoretical justifications for this observation.


2019 ◽  
Vol 68 (5) ◽  
pp. 814-827 ◽  
Author(s):  
Leo Van Iersel ◽  
Mark Jones ◽  
Steven Kelk

Abstract Perfect phylogenies are fundamental in the study of evolutionary trees because they capture the situation when each evolutionary trait emerges only once in history; if such events are believed to be rare, then by Occam’s Razor such parsimonious trees are preferable as a hypothesis of evolution. A classical result states that 2-state characters permit a perfect phylogeny precisely if each subset of 2 characters permits one. More recently, it was shown that for 3-state characters the same property holds but for size-3 subsets. A long-standing open problem asked whether such a constant exists for each number of states. More precisely, it has been conjectured that for any fixed number of states $r$ there exists a constant $f(r)$ such that a set of $r$-state characters $C$ has a perfect phylogeny if and only if every subset of at most $f(r)$ characters has a perfect phylogeny. Informally, the conjecture states that checking fixed-size subsets of characters is enough to correctly determine whether input data permits a perfect phylogeny, irrespective of the number of characters in the input. In this article, we show that this conjecture is false. In particular, we show that for any constant $t$, there exists a set $C$ of $8$-state characters such that $C$ has no perfect phylogeny, but there exists a perfect phylogeny for every subset of at most $t$ characters. Moreover, there already exists a perfect phylogeny when ignoring just one of the characters, independent of which character you ignore. This negative result complements the two negative results (“strikes”) of Bodlaender et al. (1992,2000). We reflect on the consequences of this third strike, pointing out that while it does close off some routes for efficient algorithm development, many others remain open.


Sign in / Sign up

Export Citation Format

Share Document