scholarly journals PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i169-i176 ◽  
Author(s):  
Erfan Sadeqi Azer ◽  
Farid Rashidi Mehrabadi ◽  
Salem Malikić ◽  
Xuan Cindy Li ◽  
Osnat Bartok ◽  
...  

Abstract Motivation Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. Results We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10–100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. Availability and implementation https://github.com/algo-cancer/PhISCS-BnB. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Erfan Sadeqi Azer ◽  
Farid Rashidi Mehrabadi ◽  
Xuan Cindy Li ◽  
Salem Malikić ◽  
Alejandro A. Schäffer ◽  
...  

AbstractMotivationRecent advances in single cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program (ILP), or a constraint satisfaction program (CSP), which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain (MCMC) or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology.ResultsWe introduce PhISCS-BnB, a Branch and Bound algorithm to compute the most likely perfect phylogeny (PP) on an input genotype matrix extracted from a SCS data set. PhISCS-BnB not only offers an optimality guarantee, but is also 10 to 100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a large melanoma data set derived from the sub-lineages of a cell line involving 24 clones with 3574 mutations, which returned the optimal tumor phylogeny in less than 2 hours. The resulting phylogeny also agrees with bulk exome sequencing data obtained from in vivo tumors growing out from the same cell line.Availabilityhttps://github.com/algo-cancer/PhISCS-BnB


Author(s):  
Erfan Sadeqi Azer ◽  
Mohammad Haghir Ebrahimabadi ◽  
Salem Malikić ◽  
Roni Khardon ◽  
S. Cenk Sahinalp

SummaryPrincipled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single-cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny - rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep-learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.


2018 ◽  
Author(s):  
Salem Malikic ◽  
Simone Ciccolella ◽  
Farid Rashidi Mehrabadi ◽  
Camir Ricketts ◽  
Khaledur Rahman ◽  
...  

AbstractRecent technological advances in single cell sequencing (SCS) provide high resolution data for studying intra-tumor heterogeneity and tumor evolution. Available computational methods for tumor phylogeny inference via SCS typically aim to identify the most likelyperfect phylogeny treesatisfyinginfinite sites assumption(ISA). However limitations of SCS technologies such as frequent allele dropout or highly variable sequence coverage, commonly result in mutational call errors and prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions and convergent evolution. In order to address such limitations, we, for the first time, introduce a new combinatorial formulation that integrates single cell sequencing data with matching bulk sequencing data, with the objective of minimizing a linear combination of (i) potential false negatives (due to e.g. allele dropout or variance in sequence coverage) and (ii) potential false positives (due to e.g. read errors) among mutation calls, as well as (iii) the number of mutations that violate ISA - to define theoptimal sub-perfect phylogeny.Our formulation ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and - for the first time in the context of tumor phylogeny reconstruction - a boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data under the finite sites model. Using several simulated and real SCS data sets, we demonstrate that PhISCS is not only more general but also more accurate than the alternative tumor phylogeny inference tools. PhISCS is very fast especially when its CSP based variant is used returns the optimal solution, except in rare instances for which it provides an optimality gap. PhISCS is available athttps://github.com/haghshenas/PhISCS.


Author(s):  
Bishaljit Paul ◽  
Sushovan Goswami ◽  
Dipu Mistry ◽  
Chandan Kumar Chanda

Author(s):  
Jan-Lucas Gade ◽  
Carl-Johan Thore ◽  
Jonas Stålhand

AbstractIn this study, we consider identification of parameters in a non-linear continuum-mechanical model of arteries by fitting the models response to clinical data. The fitting of the model is formulated as a constrained non-linear, non-convex least-squares minimization problem. The model parameters are directly related to the underlying physiology of arteries, and correctly identified they can be of great clinical value. The non-convexity of the minimization problem implies that incorrect parameter values, corresponding to local minima or stationary points may be found, however. Therefore, we investigate the feasibility of using a branch-and-bound algorithm to identify the parameters to global optimality. The algorithm is tested on three clinical data sets, in each case using four increasingly larger regions around a candidate global solution in the parameter space. In all cases, the candidate global solution is found already in the initialization phase when solving the original non-convex minimization problem from multiple starting points, and the remaining time is spent on increasing the lower bound on the optimal value. Although the branch-and-bound algorithm is parallelized, the overall procedure is in general very time-consuming.


Sign in / Sign up

Export Citation Format

Share Document