Single‐cell sequencing and tumorigenesis: improved understanding of tumor evolution and metastasis

The infinite sites assumption, which states that every genomic position mutates at most once over the lifetime of a tumor, is central to current approaches for reconstructing mutation histories of tumors, but has never been tested explicitly. We developed a rigorous statistical framework to test the assumption with single-cell sequencing data. The framework accounts for the high noise and contamination present in such data. We found strong evidence for recurrent mutations at the same site in 8 out of 9 single-cell sequencing datasets from human tumors. Six cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large scale genomic deletions. Two cases exhibited parallel mutation, including the dataset with the strongest evidence of recurrence. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity.

Download Full-text

Studying the History of Tumor Evolution from Single-Cell Sequencing Data by Exploring the Space of Binary Matrices

Journal of Computational Biology ◽

10.1089/cmb.2020.0595 ◽

2021 ◽

Author(s):

Salem Malikić ◽

Farid Rashidi Mehrabadi ◽

Erfan Sadeqi Azer ◽

Mohammad Haghir Ebrahimabadi ◽

Suleyman Cenk Sahinalp

Keyword(s):

Single Cell ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Cell Sequencing ◽

History Of ◽

Binary Matrices

Download Full-text

PhISCS - A Combinatorial Approach for Sub-perfect Tumor Phylogeny Reconstruction via Integrative use of Single Cell and Bulk Sequencing Data

10.1101/376996 ◽

2018 ◽

Cited By ~ 9

Author(s):

Salem Malikic ◽

Simone Ciccolella ◽

Farid Rashidi Mehrabadi ◽

Camir Ricketts ◽

Khaledur Rahman ◽

...

Keyword(s):

Single Cell ◽

Phylogeny Reconstruction ◽

Tumor Evolution ◽

Sequencing Data ◽

Perfect Phylogeny ◽

Sequence Coverage ◽

Allele Dropout ◽

Single Cell Sequencing ◽

First Time ◽

Tumor Phylogeny

AbstractRecent technological advances in single cell sequencing (SCS) provide high resolution data for studying intra-tumor heterogeneity and tumor evolution. Available computational methods for tumor phylogeny inference via SCS typically aim to identify the most likelyperfect phylogeny treesatisfyinginfinite sites assumption(ISA). However limitations of SCS technologies such as frequent allele dropout or highly variable sequence coverage, commonly result in mutational call errors and prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions and convergent evolution. In order to address such limitations, we, for the first time, introduce a new combinatorial formulation that integrates single cell sequencing data with matching bulk sequencing data, with the objective of minimizing a linear combination of (i) potential false negatives (due to e.g. allele dropout or variance in sequence coverage) and (ii) potential false positives (due to e.g. read errors) among mutation calls, as well as (iii) the number of mutations that violate ISA - to define theoptimal sub-perfect phylogeny.Our formulation ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and - for the first time in the context of tumor phylogeny reconstruction - a boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data under the finite sites model. Using several simulated and real SCS data sets, we demonstrate that PhISCS is not only more general but also more accurate than the alternative tumor phylogeny inference tools. PhISCS is very fast especially when its CSP based variant is used returns the optimal solution, except in rare instances for which it provides an optimality gap. PhISCS is available athttps://github.com/haghshenas/PhISCS.

Download Full-text

Studying the history of tumor evolution from single-cell sequencing data by exploring the space of binary matrices

10.1101/2020.07.15.204081 ◽

2020 ◽

Cited By ~ 1

Author(s):

Salem Malikić ◽

Farid Rashidi Mehrabadi ◽

Erfan Sadeqi Azer ◽

Mohammad Haghir Ebrahimabadi ◽

S. Cenk Sahinalp

Keyword(s):

Single Cell ◽

Evolutionary History ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Linear Programming Formulation ◽

History Of ◽

Binary Matrices ◽

Constraint Satisfaction Programming ◽

Integer Linear Programming Formulation

AbstractSingle-cell sequencing data has great potential in reconstructing the evolutionary history of tumors. Rapid advances in single-cell sequencing technology in the past decade were followed by the design of various computational methods for inferring trees of tumor evolution. Some of the earliest of these methods were based on the direct search in the space of trees. However, it can be shown that instead of this tree search strategy we can perform a search in the space of binary matrices and obtain the most likely tree directly from the most likely among the candidate binary matrices. The search in the space of binary matrices can be expressed as an instance of integer linear or constraint satisfaction programming and solved by some of the available solvers, which typically provide a guarantee of optimality of the reported solution. In this review, we first describe one convenient tree representation of tumor evolutionary history and present tree scoring model that is most commonly used in the available methods. We then provide proof showing that the most likely tree of tumor evolution can be obtained directly from the most likely matrix from the space of candidate binary matrices. Next, we provide integer linear programming formulation to search for such matrix and summarize the existing methods based on this formulation or its extensions. Lastly, we present one use-case which illustrates how binary matrices can be used as a basis for developing a fast deep learning method for inferring some topological properties of the most likely tree of tumor evolution.

Download Full-text

Tumor Copy Number Deconvolution Integrating Bulk and Single-Cell Sequencing Data

10.1101/519892 ◽

2019 ◽

Cited By ~ 2

Author(s):

Haoyun Lei ◽

Bochuan Lyu ◽

E. Michael Gertz ◽

Alejandro A. Schäffer ◽

Xulian Shi ◽

...

Keyword(s):

Single Cell ◽

Copy Number ◽

Simulated Data ◽

Mixed Integer ◽

Intratumor Heterogeneity ◽

Tumor Evolution ◽

Sequencing Data ◽

Minimum Evolution ◽

Promising Alternative ◽

Single Cell Sequencing

AbstractCharacterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but mixes many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing (SCS) is a promising alternative, but its limitations — e.g., high noise, difficulty scaling to large populations, technical artifacts, and large data sets — have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization (NMF) balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming (MILP) model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.

Download Full-text

Computational enhancement of single-cell sequences for inferring tumor evolution

10.1101/341743 ◽

2018 ◽

Author(s):

Sayaka Miura ◽

Louise A Huuki ◽

Tiffany Buturla ◽

Tracy Vu ◽

Karen Gomez ◽

...

Keyword(s):

Single Cell ◽

Computational Methods ◽

False Negative ◽

Evolutionary Information ◽

Tumor Evolution ◽

Single Cell Sequencing ◽

Sequencing Technologies ◽

Molecular Phylogenetic ◽

Phylogenetic Framework

AbstractMotivation: Tumor sequencing has entered an exciting phase with the advent of single-cell techniques that are revolutionizing the assessment of single nucleotide variation (SNV) at the highest cellular resolution. However, state-of-the-art single-cell sequencing technologies produce data with many missing bases (MBs) and incorrect base designations that lead to false-positive (FP) and false-negative (FN) detection of somatic mutations. While computational methods are available to make biological inferences in the presence of these errors, the accuracy of the imputed MBs and corrected FPs and FNs remains unknown.Results: Using computer simulated datasets, we assessed the robustness performance of four existing methods (OncoNEM, SCG, SCITE, and SiFit) and one new method (BEAM). BEAM is a Bayesian evolution-aware method that improves the quality of single-cell sequences by using the intrinsic evolutionary information in the single-cell data in a molecular phylogenetic framework. Overall, BEAM and SCITE performed the best. Most of the methods imputed MBs with high accuracy, but effective detection and correction of FPs and FNs require sampling a large number of SNVs. Analysis of an empirical dataset shows that computational methods can improve both the quality of tumor single-cell sequences and their utility for biological inference.Conclusions: Tumor cells descend from pre-existing cells, which creates evolutionary continuity in single-cell sequencing datasets. This information enables BEAM and other methods to correctly impute missing data and incorrect base assignments, but correction of FPs and FNs remains challenging when the number of SNVs sampled is small relative to the number of cells sequenced.Availability: BEAM is available on the web at https://github.com/SayakaMiura/BEAM.Contact:[email protected]

Download Full-text

OncoNEM: inferring tumor evolution from single-cell sequencing data

Genome Biology ◽

10.1186/s13059-016-0929-9 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 121

Author(s):

Edith M. Ross ◽

Florian Markowetz

Keyword(s):

Single Cell ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Single-cell tumor phylogeny inference with copy-number constrained mutation losses

10.1101/840355 ◽

2019 ◽

Cited By ~ 1

Author(s):

Gryte Satas ◽

Simone Zaccaria ◽

Geoffrey Mon ◽

Benjamin J. Raphael

Keyword(s):

Single Cell ◽

Copy Number ◽

Phylogenetic Trees ◽

Colorectal Cancer Patient ◽

Simulated Data ◽

Cell Tumor ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Sequencing

AbstractMotivationSingle-cell DNA sequencing enables the measurement of somatic mutations in individual tumor cells, and provides data to reconstruct the evolutionary history of the tumor. Nearly all existing methods to construct phylogenetic trees from single-cell sequencing data use single-nucleotide variants (SNVs) as markers. However, most solid tumors contain copy-number aberrations (CNAs) which can overlap loci containing SNVs. Particularly problematic are CNAs that delete an SNV, thus returning the SNV locus to the unmutated state. Such mutation losses are allowed in some models of SNV evolution, but these models are generally too permissive, allowing mutation losses without evidence of a CNA overlapping the locus.ResultsWe introduce a novel loss-supported evolutionary model, a generalization of the infinite sites and Dollo models, that constrains mutation losses to loci with evidence of a decrease in copy number. We design a new algorithm, Single-Cell Algorithm for Reconstructing the Loss-supported Evolution of Tumors (Scarlet), that infers phylogenies from single-cell tumor sequencing data using the loss-supported model and a probabilistic model of sequencing errors and allele dropout. On simulated data, we show that Scarlet outperforms current single-cell phylogeny methods, recovering more accurate trees and correcting errors in SNV data. On single-cell sequencing data from a metastatic colorectal cancer patient, Scarlet constructs a phylogeny that is both more consistent with the observed copy-number data and also reveals a simpler monooclonal seeding of the metastasis, contrasting with published reports of polyclonal seeding in this patient. Scarlet substantially improves single-cell phylogeny inference in tumors with CNAs, yielding new insights into the analysis of tumor evolution.AvailabilitySoftware is available at github.com/raphael-group/[email protected]

Download Full-text