scholarly journals PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data

2019 ◽  
Vol 29 (11) ◽  
pp. 1860-1877 ◽  
Author(s):  
Salem Malikic ◽  
Farid Rashidi Mehrabadi ◽  
Simone Ciccolella ◽  
Md. Khaledur Rahman ◽  
Camir Ricketts ◽  
...  
2020 ◽  
Vol 21 (S1) ◽  
Author(s):  
Simone Ciccolella ◽  
Mauricio Soto Gomez ◽  
Murray D. Patterson ◽  
Gianluca Della Vedova ◽  
Iman Hajirasouliha ◽  
...  

Abstract Background Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. Results We present a new tool, , that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell () tool is open source and available at https://github.com/AlgoLab/gpps. Conclusions provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.


2021 ◽  
Author(s):  
Farid Rashidi Mehrabadi ◽  
Kerrie L. Marie ◽  
Eva Perez-Guijarro ◽  
Salem Malikic ◽  
Erfan Sadeqi Azer ◽  
...  

Advances in single cell RNA sequencing (scRNAseq) technologies uncovered an unexpected complexity in solid tumors, underlining the relevance of intratumor heterogeneity for cancer progression and therapeutic resistance. Heterogeneity in the mutational composition of cancer cells is well captured by tumor phylogenies, which demonstrate how distinct cell populations evolve, and, e.g. develop metastatic potential or resistance to specific treatments. Unfortunately, because of their low read coverage per cell, mutation calls that can be made from scRNAseq data are very sparse and noisy. Additionally, available tumor phylogeny reconstruction methods cannot computationally handle a large number of cells and mutations present in typical scRNAseq datasets. Finally, there are no principled methods to assess distinct subclones observed in inferred tumor phylogenies and the genomic alterations that seed them. Here we present Trisicell, a computational toolkit for scalable tumor phylogeny reconstruction and evaluation from scRNAseq as well as single cell genome or exome sequencing data. Trisicell allows the identification of reliable subtrees of a tumor phylogeny, offering the ability to focus on the most important subclones and the genomic alterations that are associated with subclonal proliferation. We comprehensively assessed Trisicell on a melanoma model by comparing the phylogeny it builds using scRNAseq data, to those using matching bulk whole exome (bWES) and transcriptome (bWTS) sequencing data from clonal sublines derived from single cells. Our results demonstrate that tumor phylogenies based on mutation calls from scRNAseq data can be robustly inferred and evaluated by Trisicell. We also applied Trisicell to reconstruct and evaluate the phylogeny it builds using scRNAseq data from melanomas of the same mouse model after treatment with immune checkpoint blockade (ICB). After integratively analyzing our cell-specific mutation calls with their expression profiles, we observed that each subclone with a distinct set of novel somatic mutations is strongly associated with a distinct developmental status. Moreover, each subclone had developed a specific ICB-resistance mechanism. These results demonstrate that Trisicell can robustly utilize scRNAseq data to delineate intratumoral heterogeneity and tumor evolution.


2018 ◽  
Author(s):  
Salem Malikic ◽  
Simone Ciccolella ◽  
Farid Rashidi Mehrabadi ◽  
Camir Ricketts ◽  
Khaledur Rahman ◽  
...  

AbstractRecent technological advances in single cell sequencing (SCS) provide high resolution data for studying intra-tumor heterogeneity and tumor evolution. Available computational methods for tumor phylogeny inference via SCS typically aim to identify the most likelyperfect phylogeny treesatisfyinginfinite sites assumption(ISA). However limitations of SCS technologies such as frequent allele dropout or highly variable sequence coverage, commonly result in mutational call errors and prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions and convergent evolution. In order to address such limitations, we, for the first time, introduce a new combinatorial formulation that integrates single cell sequencing data with matching bulk sequencing data, with the objective of minimizing a linear combination of (i) potential false negatives (due to e.g. allele dropout or variance in sequence coverage) and (ii) potential false positives (due to e.g. read errors) among mutation calls, as well as (iii) the number of mutations that violate ISA - to define theoptimal sub-perfect phylogeny.Our formulation ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and - for the first time in the context of tumor phylogeny reconstruction - a boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data under the finite sites model. Using several simulated and real SCS data sets, we demonstrate that PhISCS is not only more general but also more accurate than the alternative tumor phylogeny inference tools. PhISCS is very fast especially when its CSP based variant is used returns the optimal solution, except in rare instances for which it provides an optimality gap. PhISCS is available athttps://github.com/haghshenas/PhISCS.


2020 ◽  
Author(s):  
Leah Weber ◽  
Nuraini Aguse ◽  
Nicholas Chia ◽  
Mohammed El-Kebir

AbstractThe combination of bulk and single-cell DNA sequencing data of the same tumor enables the inference of high-fidelity phylogenies that form the input to many important downstream analyses in cancer genomics. While many studies simultaneously perform bulk and single-cell sequencing, some studies have analyzed initial bulk data to identify which mutations to target in a follow-up single-cell sequencing experiment, thereby decreasing cost. Bulk data provide an additional untapped source of valuable information, composed of candidate phylogenies and associated clonal prevalence. Here, we introduce PhyDOSE, a method that uses this information to strategically optimize the design of follow-up single cell experiments. Underpinning our method is the observation that only a small number of clones uniquely distinguish one candidate tree from all other trees. We incorporate distinguishing features into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a leukemia patient, concluding that PhyDOSE’s computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential to achieve similar results with a significant reduction in the number of cells sequenced. In a prospective analysis, we demonstrate that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. In summary, PhyDOSE proposes cost-efficient single-cell sequencing experiments that yield high-fidelity phylogenies, which will improve downstream analyses aimed at deepening our understanding of cancer biology.Author summaryCancer development in a patient can be explained using a phylogeny — a tree that describes the evolutionary history of a tumor and has therapeutic implications. A tumor phylogeny is constructed from sequencing data, commonly obtained using either bulk or single-cell DNA sequencing technology. The accuracy of tumor phylogeny inference increases when both types of data are used, but single-cell sequencing may become prohibitively costly with increasing number of cells. Here, we propose a method that uses bulk sequencing data to guide the design of a follow-up single-cell sequencing experiment. Our results suggest that PhyDOSE provides a significant decrease in the number of cells to sequence compared to the number of cells sequenced in existing studies. The ability to make informed decisions based on prior data can help reduce the cost of follow-up single cell sequencing experiments of tumors, improving accuracy of tumor phylogeny inference and ultimately getting us closer to understanding and treating cancer.


2020 ◽  
Author(s):  
Erfan Sadeqi Azer ◽  
Farid Rashidi Mehrabadi ◽  
Xuan Cindy Li ◽  
Salem Malikić ◽  
Alejandro A. Schäffer ◽  
...  

AbstractMotivationRecent advances in single cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program (ILP), or a constraint satisfaction program (CSP), which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain (MCMC) or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology.ResultsWe introduce PhISCS-BnB, a Branch and Bound algorithm to compute the most likely perfect phylogeny (PP) on an input genotype matrix extracted from a SCS data set. PhISCS-BnB not only offers an optimality guarantee, but is also 10 to 100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a large melanoma data set derived from the sub-lineages of a cell line involving 24 clones with 3574 mutations, which returned the optimal tumor phylogeny in less than 2 hours. The resulting phylogeny also agrees with bulk exome sequencing data obtained from in vivo tumors growing out from the same cell line.Availabilityhttps://github.com/algo-cancer/PhISCS-BnB


2022 ◽  
Author(s):  
Etienne Sollier ◽  
Jack Kuipers ◽  
Niko Beerenwinkel ◽  
Koichi Takahashi ◽  
Katharina Jahn

Reconstructing the history of somatic DNA alterations that occurred in a tumour can help understand its evolution and predict its resistance to treatment. Single-cell DNA sequencing (scDNAseq) can be used to investigate clonal heterogeneity and to inform phylogeny reconstruction. However, existing phylogenetic methods for scDNAseq data are designed either for point mutations or for large copy number variations, but not for both types of events simultaneously. Here, we develop COMPASS, a computational method for inferring the joint phylogeny of mutations and copy number alterations from targeted scDNAseq data. We evaluate COMPASS on simulated data and show that it outperforms existing methods. We apply COMPASS to a large cohort of 123 patients with acute myeloid leukemia (AML) and detect copy number alterations, including subclonal ones, which are in agreement with current knowledge of AML development. We further used bulk SNP array data to orthogonally validate or findings.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Leah L. Weber ◽  
Mohammed El-Kebir

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii110-ii110
Author(s):  
Christina Jackson ◽  
Christopher Cherry ◽  
Sadhana Bom ◽  
Hao Zhang ◽  
John Choi ◽  
...  

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.


Sign in / Sign up

Export Citation Format

Share Document