RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data

Ziwei Chen; Fuzhou Gong; Lin Wan; Liang Ma

doi:10.1093/bioinformatics/btaa172

RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa172 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3299-3306

Author(s):

Ziwei Chen ◽

Fuzhou Gong ◽

Lin Wan ◽

Liang Ma

Keyword(s):

Single Cell ◽

Large Scale ◽

Clonal Evolution ◽

Low Rank ◽

Supplementary Information ◽

Breast Cancer Dataset ◽

Sequencing Data ◽

Cancer Dataset ◽

Single Cell Sequencing ◽

Model Free

Abstract Motivation Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. Results To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. Availability and implementation RobustClone software is available at https://github.com/ucasdp/RobustClone. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RobustClone: A robust PCA method of tumor clone and evolution inference from single-cell sequencing data

10.1101/666271 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ziwei Chen ◽

Fuzhou Gong ◽

Liang Ma ◽

Lin Wan

Keyword(s):

Single Cell ◽

Large Scale ◽

Principal Component ◽

Low Rank ◽

Breast Cancer Dataset ◽

Sequencing Data ◽

Cancer Dataset ◽

Large Reservoir ◽

Single Cell Sequencing ◽

Model Free

AbstractSingle-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and build phylogenetic relationships of tumor cells/clones. However, high technical errors bring much noise into the genetic data, thus limiting the application of evolutionary tools in the large reservoir. To recover the low-dimensional subspace of tumor subpopulations from error-prone SCS data in the presence of corrupted and/or missing elements, we developed an efficient computational framework, termed RobustClone, to recover the true genotypes of subclones based on the low-rank matrix factorization method of extended robust principal component analysis (RPCA) and reconstruct the subclonal evolutionary tree. RobustClone is a model-free method, fast and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods, both in accuracy and efficiency. We further validated RobustClone on 2 single-cell SNV and 2 single-cell CNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. RobustClone software is available at https://github.com/ucasdp/RobustClone.

Download Full-text

High-throughput single-cell DNA sequencing of AML tumors with droplet microfluidics

10.1101/203158 ◽

2017 ◽

Cited By ~ 2

Author(s):

Maurizio Pellegrino ◽

Adam Sciambi ◽

Sebastian Treusch ◽

Robert Durruthy-Durruthy ◽

Kaustubh Gokhale ◽

...

Keyword(s):

Single Cell ◽

Clonal Evolution ◽

Droplet Microfluidics ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Selection For ◽

Genetic Profiles ◽

Generation Sequencing

ABSTRACTTo enable the characterization of genetic heterogeneity in tumor cell populations, we developed a novel microfluidic approach that barcodes amplified genomic DNA from thousands of individual cancer cells confined to droplets. The barcodes are then used to reassemble the genetic profiles of cells from next generation sequencing data. Using this approach, we sequenced longitudinally collected AML tumor populations from two patients and genotyped up to 62 disease relevant loci across more than 16,000 individual cells. Targeted single-cell sequencing was able to sensitively identify tumor cells during complete remission and uncovered complex clonal evolution within AML tumors that was not observable with bulk sequencing. We anticipate that this approach will make feasible the routine analysis of heterogeneity in AML leading to improved stratification and therapy selection for the disease.

Download Full-text

A statistical test on single-cell data reveals widespread recurrent mutations in tumor evolution

10.1101/094722 ◽

2016 ◽

Cited By ~ 3

Author(s):

Jack Kuipers ◽

Katharina Jahn ◽

Benjamin J. Raphael ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Large Scale ◽

Tumor Evolution ◽

Sequencing Data ◽

General Validity ◽

Genomic Deletions ◽

Single Cell Sequencing ◽

Statistical Framework ◽

Recurrent Mutations ◽

Complex Models

The infinite sites assumption, which states that every genomic position mutates at most once over the lifetime of a tumor, is central to current approaches for reconstructing mutation histories of tumors, but has never been tested explicitly. We developed a rigorous statistical framework to test the assumption with single-cell sequencing data. The framework accounts for the high noise and contamination present in such data. We found strong evidence for recurrent mutations at the same site in 8 out of 9 single-cell sequencing datasets from human tumors. Six cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large scale genomic deletions. Two cases exhibited parallel mutation, including the dataset with the strongest evidence of recurrence. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity.

Download Full-text

Computational reconstruction of clonal hierarchies from bulk sequencing data of acute myeloid leukemia samples

10.1101/2021.01.23.427897 ◽

2021 ◽

Author(s):

Thomas Stiehl ◽

Anna Marciniak-Czochra

Keyword(s):

Acute Myeloid Leukemia ◽

Single Cell ◽

Myeloid Leukemia ◽

Clonal Evolution ◽

Malignant Cell ◽

Leukemic Cells ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Computational Reconstruction ◽

Acute Myeloid

AbstractAcute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.

Download Full-text

Scirpy: a Scanpy extension for analyzing single-cell T-cell receptor-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa611 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4817-4818 ◽

Cited By ~ 2

Author(s):

Gregor Sturm ◽

Tamas Szabo ◽

Georgios Fotakis ◽

Marlene Haider ◽

Dietmar Rieder ◽

...

Keyword(s):

T Cell ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Cell Receptor ◽

Supplementary Information ◽

Sequencing Data ◽

Seamless Integration ◽

T Cell Phenotypes ◽

Cell Phenotypes

Abstract Summary Advances in single-cell technologies have enabled the investigation of T-cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T-cell receptors. Here, we propose single-cell immune repertoires in Python (Scirpy), a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data. Availability and implementation Scirpy source code and documentation are available at https://github.com/icbi-lab/scirpy. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SCSsim: an integrated tool for simulating single-cell genome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btz713 ◽

2019 ◽

Author(s):

Zhenhua Yu ◽

Fang Du ◽

Xuehong Sun ◽

Ao Li

Keyword(s):

Single Cell ◽

High Efficiency ◽

Detection Efficiency ◽

Comprehensive Evaluation ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Technical Issues ◽

Cell Genome ◽

User Intervention

Abstract Motivation Allele dropout (ADO) and unbalanced amplification of alleles are main technical issues of single-cell sequencing (SCS), and effectively emulating these issues is necessary for reliably benchmarking SCS-based bioinformatics tools. Unfortunately, currently available sequencing simulators are free of whole-genome amplification involved in SCS technique and therefore not suited for generating SCS datasets. We develop a new software package (SCSsim) that can efficiently simulate SCS datasets in a parallel fashion with minimal user intervention. SCSsim first constructs the genome sequence of single cell by mimicking a complement of genomic variations under user-controlled manner, and then amplifies the genome according to MALBAC technique and finally yields sequencing reads from the amplified products based on inferred sequencing profiles. Comprehensive evaluation in simulating different ADO rates, variation detection efficiency and genome coverage demonstrates that SCSsim is a very useful tool in mimicking single-cell sequencing data with high efficiency. Availability and implementation SCSsim is freely available at https://github.com/qasimyu/scssim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single-Cell Mutational Profiling Describes the Molecular Heterogeneity of Clonal Evolution in MDS during Therapy and Relapse

Blood ◽

10.1182/blood-2018-99-120368 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 5503-5503

Author(s):

Alexey Aleshin ◽

Robert Durruthy-Durruthy ◽

Bruno C. Medeiros ◽

Dennis J. Eastburn ◽

Peter L Greenberg

Keyword(s):

Single Cell ◽

Clonal Evolution ◽

Next Generation Sequencing Data ◽

Molecular Heterogeneity ◽

Disease Relapse ◽

Sequencing Data ◽

Employment Equity ◽

Hematopoietic Stem ◽

Single Cell Sequencing ◽

Equity Ownership

Abstract Background: Myelodysplastic syndromes (MDS) are a collection of clonal diseases of dysfunctional hematopoietic stem cells, characterized by ineffective hematopoiesis, cytopenias, and dysplasia. Increased understanding of the mutational landscape of MDS has led to initial improvements in prognostic models based on clinical and cytogenetic variables. However, bulk sequencing techniques are limited in their ability to delineate clonal complexity and identify rare drug resistant subclones. To better understand clonal heterogeneity and clonal evolution of MDS we applied a high-throughput single cell sequencing technique to both diagnostic and longitudinal MDS samples. Methods: Samples were examined for 5 patients with MDS at diagnosis and, when available, progression. Mutational bulk sequencing was performed by NGS panel sequencing and exon sequencing was available in select cases. Single cell processing was performed using the Tapestri (Mission Bio) platform. Briefly, individual cells were isolated using a microfluidic approach, followed by barcoding and genomic DNA amplification for individual cancer cells confined to droplets. Barcodes are then used to reassemble the genetic profiles of cells from next generation sequencing data. We applied this approach to individual MDS samples, genotyping the most clinically relevant loci across upwards of 10,000 individual cells. Results: Single-cell sequencing was able to be performed successfully on all samples tested and recapitulated bulk sequencing data. We observed high concordance between bulk variant allele frequencies (VAFs) and sample level VAFs derived from single cell sequencing data (r2 = 0.98). Additionally, single cell analysis allowed for resolution of subclonal architecture and tumor phylogenetic evolution beyond what was predicted from bulk sequencing alone. Single-cell SNVs were able to resolve host and donor cell populations after bone marrow transplant and accurately predict chimerism and disease relapse. Furthermore, we were able to resolve the co-occurance of molecular alterations within subclones and establish zygosity of individual mutations at a single cell level. Rare subclones associated with disease relapse, were able to be identified in initial diagnostic samples that were frequently under the limit of detection of bulk NGS. Conclusions: Our results suggest more molecular complexity in MDS tumor samples than implied from bulk sequencing methods alone and indicates utility of single-cell sequencing for identification of resistant clones and longitudinal therapy monitoring. Disclosures Aleshin: Mission Bio, Inc.: Consultancy; Natera, Inc.: Employment. Durruthy-Durruthy:Mission Bio, Inc.: Employment, Equity Ownership. Medeiros:Genentech: Employment; Celgene: Consultancy, Research Funding. Eastburn:Mission Bio, Inc.: Employment, Equity Ownership.

Download Full-text

Computational Reconstruction of Clonal Hierarchies From Bulk Sequencing Data of Acute Myeloid Leukemia Samples

Frontiers in Physiology ◽

10.3389/fphys.2021.596194 ◽

2021 ◽

Vol 12 ◽

Author(s):

Thomas Stiehl ◽

Anna Marciniak-Czochra

Keyword(s):

Acute Myeloid Leukemia ◽

Single Cell ◽

Myeloid Leukemia ◽

Clonal Evolution ◽

Malignant Cell ◽

Leukemic Cells ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Computational Reconstruction ◽

Acute Myeloid

Acute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.

Download Full-text

rCASC: reproducible Classification Analysis of Single Cell sequencing data

10.1101/430967 ◽

2018 ◽

Cited By ~ 1

Author(s):

Luca Alessandrì ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

Greta Romano ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

R Package ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Analysis Workflow ◽

User Friendly ◽

Bioinformatics Workflows

AbstractSummarySingle-cell RNA sequencing has emerged as an essential tool to investigate cellular heterogeneity, and highlighting cell sub-population specific signatures. Nowadays, dedicated and user-friendly bioinformatics workflows are required to exploit the deconvolution of single-cells transcriptome. Furthermore, there is a growing need of bioinformatics workflows granting both functional, i.e. saving information about data and analysis parameters, and computation reproducibility, i.e. storing the real image of the computation environment. Here, we present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility.Availability and ImplementationrCASC is part of the reproducible bioinfomatics project. rCASC is a docker based application controlled by a R package available at https://github.com/kendomaniac/rCASC.Supplementary informationSupplementary data are available at rCASC github

Download Full-text

scMatch: a single-cell gene expression profile annotation tool using reference datasets

Bioinformatics ◽

10.1093/bioinformatics/btz292 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4688-4695 ◽

Cited By ~ 22

Author(s):

Rui Hou ◽

Elena Denisenko ◽

Alistair R R Forrest

Keyword(s):

Gene Expression ◽

Single Cell ◽

Large Scale ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Supplementary Information ◽

Annotation Tool ◽

Sequencing Data ◽

Multiple Sources

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. Results We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. Availability and implementation scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text