scholarly journals A Combinatorial Approach for Single-cell Variant Detection via Phylogenetic Inference

2019 ◽  
Author(s):  
Mohammadamin Edrisi ◽  
Hamim Zafar ◽  
Luay Nakhleh

AbstractSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout and non-uniform coverage. A recently introduced single-cell-specific mutation detection algorithm leverages the evolutionary relationship between cells for denoising the data. However, due to its probabilistic nature, this method does not scale well with the number of cells. Here, we develop a novel combinatorial approach for utilizing the genealogical relationship of cells in detecting mutations from noisy single-cell sequencing data. Our method, called scVILP, jointly detects mutations in individual cells and reconstructs a perfect phylogeny among these cells. We employ a novel Integer Linear Program algorithm for deterministically and efficiently solving the joint inference problem. We show that scVILP achieves similar or better accuracy but significantly better runtime over existing methods on simulated data. We also applied scVILP to an empirical human cancer dataset from a high grade serous ovarian cancer patient.

Author(s):  
Mastan Mannarapu ◽  
Begum Dariya ◽  
Obul Reddy Bandapalli

AbstractPancreatic cancer (PC) is the third lethal disease for cancer-related mortalities globally. This is mainly because of the aggressive nature and heterogeneity of the disease that is diagnosed only in their advanced stages. Thus, it is challenging for researchers and clinicians to study the molecular mechanism involved in the development of this aggressive disease. The single-cell sequencing technology enables researchers to study each and every individual cell in a single tumor. It can be used to detect genome, transcriptome, and multi-omics of single cells. The current single-cell sequencing technology is now becoming an important tool for the biological analysis of cells, to find evolutionary relationship between multiple cells and unmask the heterogeneity present in the tumor cells. Moreover, its sensitivity nature is found progressive enabling to detect rare cancer cells, circulating tumor cells, metastatic cells, and analyze the intratumor heterogeneity. Furthermore, these single-cell sequencing technologies also promoted personalized treatment strategies and next-generation sequencing to predict the disease. In this review, we have focused on the applications of single-cell sequencing technology in identifying cancer-associated cells like cancer-associated fibroblast via detecting circulating tumor cells. We also included advanced technologies involved in single-cell sequencing and their advantages. The future research indeed brings the single-cell sequencing into the clinical arena and thus could be beneficial for diagnosis and therapy of PC patients.


2018 ◽  
Author(s):  
Jochen Singer ◽  
Jack Kuipers ◽  
Katharina Jahn ◽  
Niko Beerenwinkel

AbstractUnderstanding the evolution of cancer is important for the development of appropriate cancer therapies. The task is challenging because tumors evolve as heterogeneous cell populations with an unknown number of genetically distinct subclones of varying frequencies. Conventional approaches based on bulk sequencing are limited in addressing this challenge as clones cannot be observed directly. Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however, it has its own challenges including elevated error rates, allelic dropout, and uneven coverage. Here, we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells. Our method, called SCIΦ, jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells. Employing a Markov Chain Monte Carlo scheme we robustly account for the various sources of noise in single-cell sequencing data. Our approach enables us to reliably call mutations in each single cell even in experiments with high dropout rates and missing data. We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets, namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset. Availability: https://github.com/cbg-ethz/SCIPhI


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1800-1800
Author(s):  
Masahiro Marshall Nakagawa ◽  
Ryosaku Inagaki ◽  
Yasuhito Nannya ◽  
Lanying Zhao ◽  
Yutaka Kuroda ◽  
...  

Abstract Recent advances in single-cell sequencing (sc-Seq) technologies have enabled high-throughput transcriptome analysis in thousands of cells to understand the heterogeneity among cancer populations in terms of genome-wide gene expression. However, its application to the analysis of clonal evolution of cancer populations is largely limited by the lack of an efficient sc-Seq platform that allows for accurate detection of gene mutations at the same time with transcriptome analysis. The major challenge here is a frequent allele dropout of just two copies per single cell, which results in an inaccurate genotype assignment for many cells, preventing identification of relevant genotype-phenotype correlations. To overcome this, we developed a novel sc-Seq platform (scMutSeq) that allows for precise determination of both genotype and genome-wide gene expression simultaneously with negligible allele dropouts, on the basis of the Fluidigm C1 Single-Cell mRNA Seq HT system and applied it to the analysis of clonal evolution and intratumor heterogeneity of myelodysplastic syndromes (MDS) characterized by frequent clonal evolution to acute amyloid leukemia (AML). We first evaluated the performance of our plat form using an AML-derived cell line with heterozygous SF3B1K700E mutation, HNT-34, for which efficiency of the detection of both wild-type and mutant allele, together with global gene expression, was evaluated. Among 400 cells subjected to scMutSeq analysis, a total of 125 passed QC, in which cell viability was assessed in terms of expression of mitochondrial genes. Global gene expression and heterozygous SF3B1mutation were successfully detected in all the QC-confirmed cells with none of the cells showing the wild-type allele or homozygous SF3B1mutation, where evaluable transcript reads (unique molecular identifier >=1) were obtained for a median of 2,753 genes, designated as nGene. The performance was also tested for flow-sorted hematopoietic stem/progenitor cells (HSPCs) (Lin−CD34+) from an MDS patient positive for the SF3B1K700E mutation. Gene expression was successfully analyzed all the QC-confirmed cells (n=81) with a median nGene of 1,953. No substantial allele dropouts were suspected, because none of the cells genotyped had homozygous SF3B1mutation. We then applied scMutSeq to the analysis of TP53-mutated AML/MDS with complex karyotype, including del(5q) and del(7q), for which longitudinal samples were obtained for the assessment of clonal evolution. scMutSeq successfully analyzed the mutation status of TP53and global gene expression profiles at a single-cell level, where copy number abnormalities were also evaluated on the basis of gene expression. We identified two discrete clones in the HSPC fraction, carrying both del(5q) and del(7q) and del(5q) alone, respectively, even though the analysis of bulk DNA had failed to detect the latter clone, indicating that a minor clone having a distinct genotype came under detection with scMutSeq. Moreover, the HSPCs with both del(5q) and del(7q) showed aberrant expression of erythroid and megakaryocytic genes, increased expression of inflammatory signals and decreased expression of cell cycle-related genes, exhibiting a clear genotype phenotype correlation. Subsequent analysis of samples at later time points further disclosed evolution of clones having discrete del(5q) deletions and expression, revealing a complexity of clonal evolution in MDS. Next, to investigate the early process of MDS development, we analyzed clonal hematopoiesis found in a minor fraction (1.2-12%) of bone marrow samples from three elder individuals having hip replacement surgery, in which DNMT3A(n=1) (R882H) and TET2(n=2) (D905fs and Q1540fs) mutations had been detected by ddPCR or targeted deep sequencing, respectively. scMutSeq analysis of the HSPCs from these individuals revealed that mutant HSPCs showed distinct gene expression profiles, depending on the type of CHIP mutations. To summarize, our single-cell sequencing platform enables to detect both genetic and transcriptional heterogeneities, providing a powerful clue to understand clonal evolution and intratumor heterogeneity of MDS. Disclosures Nakagawa: Sumitomo Dainippon Pharma Co., Ltd.: Research Funding. Inagaki:Sumitomo Dainippon Pharma Co., Ltd.: Employment. Yoda:Chordia Therapeutics Inc.: Research Funding.


2019 ◽  
Author(s):  
Ziwei Chen ◽  
Fuzhou Gong ◽  
Liang Ma ◽  
Lin Wan

AbstractSingle-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and build phylogenetic relationships of tumor cells/clones. However, high technical errors bring much noise into the genetic data, thus limiting the application of evolutionary tools in the large reservoir. To recover the low-dimensional subspace of tumor subpopulations from error-prone SCS data in the presence of corrupted and/or missing elements, we developed an efficient computational framework, termed RobustClone, to recover the true genotypes of subclones based on the low-rank matrix factorization method of extended robust principal component analysis (RPCA) and reconstruct the subclonal evolutionary tree. RobustClone is a model-free method, fast and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods, both in accuracy and efficiency. We further validated RobustClone on 2 single-cell SNV and 2 single-cell CNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. RobustClone software is available at https://github.com/ucasdp/RobustClone.


2020 ◽  
Vol 36 (11) ◽  
pp. 3299-3306
Author(s):  
Ziwei Chen ◽  
Fuzhou Gong ◽  
Lin Wan ◽  
Liang Ma

Abstract Motivation Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. Results To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. Availability and implementation RobustClone software is available at https://github.com/ucasdp/RobustClone. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (19) ◽  
pp. 4854-4859
Author(s):  
Nico Borgsmüller ◽  
Jose Bonet ◽  
Francesco Marass ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas ◽  
...  

Abstract Motivation The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intratumor heterogeneity (ITH) by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq datasets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Results Here, we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq datasets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime and scalability. Its inferred genotypes were the most accurate, especially on highly heterogeneous data, and it was the only method able to run and produce results on datasets with 5000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by Supplementary Experimental Data. With ever growing scDNA-seq datasets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve ITH but also as a preprocessing step to reduce data size. Availability and implementation BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Shaolong Cao ◽  
Jennifer R. Wang ◽  
Shuangxi Ji ◽  
Peng Yang ◽  
Jingxiao Chen ◽  
...  

AbstractCancers can vary greatly in their transcriptomes. In contrast to alterations in specific genes or pathways, the significance of differences in tumor cell total mRNA content is poorly understood. Studies using single-cell sequencing or model systems have suggested a role for total mRNA content in regulating cellular phenotypes. However, analytical challenges related to technical artifacts and cellular admixture have impeded examination of total mRNA expression at scale across cancers. To address this, we evaluated total mRNA expression using single cell sequencing, and developed a computational method for quantifying tumor-specific total mRNA expression (TmS) from bulk sequencing data. We systematically estimated TmS in 5,181 patients across 15 cancer types and observed close correlations with clinicopathologic characteristics and molecular features, where high TmS generally accompanies high-risk disease. At a pan-cancer level, high TmS is associated with increased risk of disease progression and death. Moreover, TmS captures tumor type-specific effects of somatic mutations, chromosomal instability, and hypoxia, as well as aspects of intratumor heterogeneity. Taken together, our results suggest that measuring total mRNA expression offers a broader perspective of tracking cancer transcriptomes, which has important clinical and biological implications.


2019 ◽  
Author(s):  
Haoyun Lei ◽  
Bochuan Lyu ◽  
E. Michael Gertz ◽  
Alejandro A. Schäffer ◽  
Xulian Shi ◽  
...  

AbstractCharacterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but mixes many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing (SCS) is a promising alternative, but its limitations — e.g., high noise, difficulty scaling to large populations, technical artifacts, and large data sets — have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization (NMF) balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming (MILP) model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.


2018 ◽  
Author(s):  
Simone Ciccolella ◽  
Mauricio Soto Gomez ◽  
Murray Patterson ◽  
Gianluca Della Vedova ◽  
Iman Hajirasouliha ◽  
...  

AbstractMotivationIn recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progression where mutations are accumulated through histories. However, some recent studies leveraging Single Cell Sequencing (SCS) techniques have shown evidence of mutation losses in several tumor samples [19], making the inference problem harder.ResultsWe present a new tool, gpps, that reconstructs a tumor phylogeny from single cell data, allowing each mutation to be lost at most a fixed number of times.AvailabilityThe General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gppf.


Sign in / Sign up

Export Citation Format

Share Document