scholarly journals SCuPhr: A Probabilistic Framework for Cell Lineage Tree Reconstruction

2018 ◽  
Author(s):  
Hazal Koptagel ◽  
Seong-Hwan Jun ◽  
Jens Lagergren

AbstractReconstruction of cell lineage trees from single-cell DNA sequencing data, has the potential to become a fundamental tool in study of development of disease, in particular cancer. For cells without copy number alterations that has not been exposed to specific marking techniques, that is normal cells, lineage tracing is naturally based on somatic point mutations. Current single cell sequencing techniques applicable to such cells require an amplification step, which introduces errors, and still often suffer from so-called allelic dropout. We present a detailed model of current technologies for the purpose of estimating the distance between cells without copy number changes, based on single-cell DNA sequencing data. The model is well suited for full Bayesian analysis by introducing prior probabilities for key parameters as well as maximum a posteriori estimation using expectation maximization algorithm. Our model outputs distance between two cells, simultaneously taking all the other cells into account. In particular, the model contains variables associated with pairs of loci, of which one is homozygous and the other heterozygous, and has the capacity to perform Bayesian probabilistic read phasing. By applying a fast distance based method, such as FNJ, to the estimated distance, a cell lineage tree can be obtained. In contrast to MCMC based methods, FNJ can easily handle data sets with tens of thousands of taxa. The high accuracy of the so obtained method, called SCuPhr, is shown in studies of several synthetic data set.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fang Wang ◽  
Qihan Wang ◽  
Vakul Mohanty ◽  
Shaoheng Liang ◽  
Jinzhuang Dou ◽  
...  

AbstractWe present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT.


2020 ◽  
Vol 16 (7) ◽  
pp. e1008012 ◽  
Author(s):  
Xian F. Mallory ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

Author(s):  
Jack Kuipers ◽  
Mustafa Anıl Tuncel ◽  
Pedro Ferreira ◽  
Katharina Jahn ◽  
Niko Beerenwinkel

Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations. We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to a xenograft breast cancer sample.


2019 ◽  
Author(s):  
Xian Fan ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

AbstractSingle-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. Here we review the major steps that are followed by these methods when analyzing such data, and then review the strengths and limitations of the methods individually. In terms of segmenting the genome into regions of different copy numbers, we categorize the methods into three groups, select a representative method from each group that has been commonly used in this context, and benchmark them on simulated as well as real datasets. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.


2021 ◽  
Author(s):  
Sanjana Rajan ◽  
Simone Zaccaria ◽  
Matthew V. Cannon ◽  
Maren Cam ◽  
Amy C. Gross ◽  
...  

AbstractOsteosarcoma is an aggressive malignancy characterized by high genomic complexity. Identification of few recurrent mutations in protein coding genes suggests that somatic copy-number aberrations (SCNAs) are the genetic drivers of disease. Models around genomic instability conflict-it is unclear if osteosarcomas result from pervasive ongoing clonal evolution with continuous optimization of the fitness landscape or an early catastrophic event followed by stable maintenance of an abnormal genome. We address this question by investigating SCNAs in 12,019 tumor cells obtained from expanded patient tissues using single-cell DNA sequencing, in ways that were previously impossible with bulk sequencing. Using the CHISEL algorithm, we inferred allele- and haplotype-specific SCNAs from whole-genome single-cell DNA sequencing data. Surprisingly, we found that, despite extensive genomic aberrations, cells within each tumor exhibit remarkably homogeneous SCNA profiles with little sub-clonal diversification. Longitudinal analysis between two pairs of patient samples obtained at distant time points (early detection, relapse) demonstrated remarkable conservation of SCNA profiles over tumor evolution. Phylogenetic analysis suggests that the bulk of SCNAs was acquired early in the oncogenic process, with few new events arising in response to therapy or during adaptation to growth in distant tissues. These data suggest that early catastrophic events, rather than sustained genomic instability, drive formation of these extensively aberrant genomes. Overall, we demonstrate the power of combining single-cell DNA sequencing with an allele- and haplotype-specific SCNA inference algorithm to resolve longstanding questions regarding genetics of tumor initiation and progression, questioning the underlying assumptions of genomic instability inferred from bulk tumor data.


2019 ◽  
Author(s):  
Hamim Zafar ◽  
Chieh Lin ◽  
Ziv Bar-Joseph

AbstractRecent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a consensus lineage tree. To address these issues we developed a novel method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of a consensus lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Hamim Zafar ◽  
Chieh Lin ◽  
Ziv Bar-Joseph

Abstract Recent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a species-invariant lineage tree. To address these issues we developed a statistical method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of an invariant lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.


2020 ◽  
Author(s):  
Fang Wang ◽  
Qihan Wang ◽  
Vakul Mohanty ◽  
Shaoheng Liang ◽  
Jinzhuang Dou ◽  
...  

AbstractAneuploidy plays critical roles in genome evolution.Alleles, whose dosages affect the fitness of an ancestor, will have altered frequencies in the descendant populations upon perturbation.Single-cell sequencing enables comprehensive genome-wide copy number profiling of thousands of cells at various evolutionary stage and lineage. That makes it possible to discover dosage effects invisible at tissue level, provided that the cell lineages can be accurately reconstructed.Here, we present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles. We also present a statistical routine named lineage speciation analysis (LSA), which facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees.We assessed our approaches using a variety of single-cell datasets. Overall, MEDALT appeared more accurate than phylogenetics approaches in reconstructing copy number lineage. From the single-cell DNA-sequencing data of 20 triple-negative breast cancer patients, our approaches effectively prioritized genes that are essential for breast cancer cell fitness and are predictive of patient survival, including those implicating convergent evolution. Similar benefits were observed when applying our approaches on single-cell RNA sequencing data obtained from cancer patients.The source code of our study is available at https://github.com/KChen-lab/MEDALT.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xian F. Mallory ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Leah L. Weber ◽  
Mohammed El-Kebir

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.


Sign in / Sign up

Export Citation Format

Share Document