scholarly journals RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data

2020 ◽  
Vol 36 (11) ◽  
pp. 3299-3306
Author(s):  
Ziwei Chen ◽  
Fuzhou Gong ◽  
Lin Wan ◽  
Liang Ma

Abstract Motivation Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. Results To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. Availability and implementation RobustClone software is available at https://github.com/ucasdp/RobustClone. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Ziwei Chen ◽  
Fuzhou Gong ◽  
Liang Ma ◽  
Lin Wan

AbstractSingle-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and build phylogenetic relationships of tumor cells/clones. However, high technical errors bring much noise into the genetic data, thus limiting the application of evolutionary tools in the large reservoir. To recover the low-dimensional subspace of tumor subpopulations from error-prone SCS data in the presence of corrupted and/or missing elements, we developed an efficient computational framework, termed RobustClone, to recover the true genotypes of subclones based on the low-rank matrix factorization method of extended robust principal component analysis (RPCA) and reconstruct the subclonal evolutionary tree. RobustClone is a model-free method, fast and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods, both in accuracy and efficiency. We further validated RobustClone on 2 single-cell SNV and 2 single-cell CNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. RobustClone software is available at https://github.com/ucasdp/RobustClone.


2017 ◽  
Author(s):  
Maurizio Pellegrino ◽  
Adam Sciambi ◽  
Sebastian Treusch ◽  
Robert Durruthy-Durruthy ◽  
Kaustubh Gokhale ◽  
...  

ABSTRACTTo enable the characterization of genetic heterogeneity in tumor cell populations, we developed a novel microfluidic approach that barcodes amplified genomic DNA from thousands of individual cancer cells confined to droplets. The barcodes are then used to reassemble the genetic profiles of cells from next generation sequencing data. Using this approach, we sequenced longitudinally collected AML tumor populations from two patients and genotyped up to 62 disease relevant loci across more than 16,000 individual cells. Targeted single-cell sequencing was able to sensitively identify tumor cells during complete remission and uncovered complex clonal evolution within AML tumors that was not observable with bulk sequencing. We anticipate that this approach will make feasible the routine analysis of heterogeneity in AML leading to improved stratification and therapy selection for the disease.


2016 ◽  
Author(s):  
Jack Kuipers ◽  
Katharina Jahn ◽  
Benjamin J. Raphael ◽  
Niko Beerenwinkel

The infinite sites assumption, which states that every genomic position mutates at most once over the lifetime of a tumor, is central to current approaches for reconstructing mutation histories of tumors, but has never been tested explicitly. We developed a rigorous statistical framework to test the assumption with single-cell sequencing data. The framework accounts for the high noise and contamination present in such data. We found strong evidence for recurrent mutations at the same site in 8 out of 9 single-cell sequencing datasets from human tumors. Six cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large scale genomic deletions. Two cases exhibited parallel mutation, including the dataset with the strongest evidence of recurrence. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity.


2021 ◽  
Author(s):  
Thomas Stiehl ◽  
Anna Marciniak-Czochra

AbstractAcute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.


2020 ◽  
Vol 36 (18) ◽  
pp. 4817-4818 ◽  
Author(s):  
Gregor Sturm ◽  
Tamas Szabo ◽  
Georgios Fotakis ◽  
Marlene Haider ◽  
Dietmar Rieder ◽  
...  

Abstract Summary Advances in single-cell technologies have enabled the investigation of T-cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T-cell receptors. Here, we propose single-cell immune repertoires in Python (Scirpy), a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data. Availability and implementation Scirpy source code and documentation are available at https://github.com/icbi-lab/scirpy. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Zhenhua Yu ◽  
Fang Du ◽  
Xuehong Sun ◽  
Ao Li

Abstract Motivation Allele dropout (ADO) and unbalanced amplification of alleles are main technical issues of single-cell sequencing (SCS), and effectively emulating these issues is necessary for reliably benchmarking SCS-based bioinformatics tools. Unfortunately, currently available sequencing simulators are free of whole-genome amplification involved in SCS technique and therefore not suited for generating SCS datasets. We develop a new software package (SCSsim) that can efficiently simulate SCS datasets in a parallel fashion with minimal user intervention. SCSsim first constructs the genome sequence of single cell by mimicking a complement of genomic variations under user-controlled manner, and then amplifies the genome according to MALBAC technique and finally yields sequencing reads from the amplified products based on inferred sequencing profiles. Comprehensive evaluation in simulating different ADO rates, variation detection efficiency and genome coverage demonstrates that SCSsim is a very useful tool in mimicking single-cell sequencing data with high efficiency. Availability and implementation SCSsim is freely available at https://github.com/qasimyu/scssim. Supplementary information Supplementary data are available at Bioinformatics online.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 5503-5503
Author(s):  
Alexey Aleshin ◽  
Robert Durruthy-Durruthy ◽  
Bruno C. Medeiros ◽  
Dennis J. Eastburn ◽  
Peter L Greenberg

Abstract Background: Myelodysplastic syndromes (MDS) are a collection of clonal diseases of dysfunctional hematopoietic stem cells, characterized by ineffective hematopoiesis, cytopenias, and dysplasia. Increased understanding of the mutational landscape of MDS has led to initial improvements in prognostic models based on clinical and cytogenetic variables. However, bulk sequencing techniques are limited in their ability to delineate clonal complexity and identify rare drug resistant subclones. To better understand clonal heterogeneity and clonal evolution of MDS we applied a high-throughput single cell sequencing technique to both diagnostic and longitudinal MDS samples. Methods: Samples were examined for 5 patients with MDS at diagnosis and, when available, progression. Mutational bulk sequencing was performed by NGS panel sequencing and exon sequencing was available in select cases. Single cell processing was performed using the Tapestri (Mission Bio) platform. Briefly, individual cells were isolated using a microfluidic approach, followed by barcoding and genomic DNA amplification for individual cancer cells confined to droplets. Barcodes are then used to reassemble the genetic profiles of cells from next generation sequencing data. We applied this approach to individual MDS samples, genotyping the most clinically relevant loci across upwards of 10,000 individual cells. Results: Single-cell sequencing was able to be performed successfully on all samples tested and recapitulated bulk sequencing data. We observed high concordance between bulk variant allele frequencies (VAFs) and sample level VAFs derived from single cell sequencing data (r2 = 0.98). Additionally, single cell analysis allowed for resolution of subclonal architecture and tumor phylogenetic evolution beyond what was predicted from bulk sequencing alone. Single-cell SNVs were able to resolve host and donor cell populations after bone marrow transplant and accurately predict chimerism and disease relapse. Furthermore, we were able to resolve the co-occurance of molecular alterations within subclones and establish zygosity of individual mutations at a single cell level. Rare subclones associated with disease relapse, were able to be identified in initial diagnostic samples that were frequently under the limit of detection of bulk NGS. Conclusions: Our results suggest more molecular complexity in MDS tumor samples than implied from bulk sequencing methods alone and indicates utility of single-cell sequencing for identification of resistant clones and longitudinal therapy monitoring. Disclosures Aleshin: Mission Bio, Inc.: Consultancy; Natera, Inc.: Employment. Durruthy-Durruthy:Mission Bio, Inc.: Employment, Equity Ownership. Medeiros:Genentech: Employment; Celgene: Consultancy, Research Funding. Eastburn:Mission Bio, Inc.: Employment, Equity Ownership.


2021 ◽  
Vol 12 ◽  
Author(s):  
Thomas Stiehl ◽  
Anna Marciniak-Czochra

Acute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.


2018 ◽  
Author(s):  
Luca Alessandrì ◽  
Marco Beccuti ◽  
Maddalena Arigoni ◽  
Martina Olivero ◽  
Greta Romano ◽  
...  

AbstractSummarySingle-cell RNA sequencing has emerged as an essential tool to investigate cellular heterogeneity, and highlighting cell sub-population specific signatures. Nowadays, dedicated and user-friendly bioinformatics workflows are required to exploit the deconvolution of single-cells transcriptome. Furthermore, there is a growing need of bioinformatics workflows granting both functional, i.e. saving information about data and analysis parameters, and computation reproducibility, i.e. storing the real image of the computation environment. Here, we present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility.Availability and ImplementationrCASC is part of the reproducible bioinfomatics project. rCASC is a docker based application controlled by a R package available at https://github.com/kendomaniac/rCASC.Supplementary informationSupplementary data are available at rCASC github


2019 ◽  
Vol 35 (22) ◽  
pp. 4688-4695 ◽  
Author(s):  
Rui Hou ◽  
Elena Denisenko ◽  
Alistair R R Forrest

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. Results We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. Availability and implementation scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document