gpps: An ILP-based approach for inferring cancer progression with mutation losses from single cell data

Abstract Background Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. Results We present a new tool, , that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell () tool is open source and available at https://github.com/AlgoLab/gpps. Conclusions provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.

Download Full-text

Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses

Bioinformatics ◽

10.1093/bioinformatics/btaa722 ◽

2020 ◽

Author(s):

Simone Ciccolella ◽

Camir Ricketts ◽

Mauricio Soto Gomez ◽

Murray Patterson ◽

Dana Silverbush ◽

...

Keyword(s):

Simulated Annealing ◽

Single Cell ◽

Computational Methods ◽

Cancer Progression ◽

Evolutionary History ◽

Supplementary Information ◽

Fundamental Feature ◽

Robust Approach ◽

Single Cell Sequencing ◽

History Of

Abstract Motivation In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. Availability and implementation The SASC tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Inferring Cancer Progression from Single-cell Sequencing while Allowing Mutation Losses

10.1101/268243 ◽

2018 ◽

Cited By ~ 11

Author(s):

Simone Ciccolella ◽

Mauricio Soto Gomez ◽

Murray Patterson ◽

Gianluca Della Vedova ◽

Iman Hajirasouliha ◽

...

Keyword(s):

Simulated Annealing ◽

Single Cell ◽

Cancer Progression ◽

Evolutionary History ◽

Real Data ◽

Simple Extension ◽

Data Sets ◽

Fundamental Feature ◽

Single Cell Sequencing ◽

History Of

AbstractMotivationIn recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions seen as an accumulation of mutations. However, recent studies (Kuiperset al., 2017) leveraging Single-cell Sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. Still, established methods that can infer phylogenies with mutation losses are however lacking.ResultsWe present theSASC(Simulated Annealing Single-Cell inference) tool which is a new and robust approach based on simulated annealing for the inference of cancer progression from SCS data. More precisely, we introduce a simple extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of back mutations in the evolutionary history of the tumor: the Dollo-kmodel. We demonstrate thatSASCachieves high levels of accuracy when tested on both simulated and real data sets and in comparison with some other available methods.AvailabilityThe Simulated Annealing Single-cell inference (SASC) tool is open source and available athttps://github.com/sciccolella/[email protected]

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

Identification Of Gene Signature For Renal Cell Carcinoma-Associated Fibroblasts Mediating Cancer Progression And Affecting Prognosis

10.21203/rs.3.rs-49601/v1 ◽

2020 ◽

Author(s):

Bitian Liu ◽

Xiaonan Chen ◽

Yunhong Zhan ◽

Bin Wu ◽

Shen Pan

Keyword(s):

Renal Cell Carcinoma ◽

Cell Carcinoma ◽

Single Cell ◽

Clinical Significance ◽

Cell Lines ◽

Cancer Progression ◽

Renal Cell ◽

Gene Signature ◽

Pathological Grade ◽

Single Cell Sequencing

Abstract Background: Cancer-associated fibroblasts (CAFs) are most abundant in stroma and are critically involved in cancer progression. However, the specific signature of CAFs and related clinicopathological parameters in renal cell carcinoma (RCC) remain unclear. Methods: In this work, methods using recognized gene signatures were employed to roughly assess the infiltration level of the stroma and CAFs in RCC based on the data in The Cancer Genome Atlas. Weighted gene co-expression network analysis (WGCNA) was used to cluster transcriptomes and correlate with CAFs to identify specific markers. A comparison of fibroblast versus urothelial carcinoma cell lines and correlation with previously reported CAF markers were performed to demonstrate the specific expressed of the gene signature. The gene signature was used to compare fibroblast infiltration of each sample through single sample gene set enrichment analysis, and the clinical significance of fibroblasts was analyzed via Cox risk assessment and the chi-square test. Finally, we used validation data to verify the clinical significance of the fibroblast gene signature in RCC. Results: Roughly calculated tumor matrix and CAF levels were significantly higher in kidney cancer than in normal tissues. More than 85% of fibroblast-specific markers identified by WGCNA were consistent with markers obtained via single-cell sequencing. These markers were more highly expressed in fibroblast cell lines and were significantly correlated with canonical CAFs makers. Data validation also showed that CAFs were significant correlation with survival and pathological grade. Conclusions: In summary, our findings indicate that the gene signature potentially serves as a biomarker of CAFs in RCC and that infiltration of fibroblasts in RCC is an independent prognostic factor associated with pathological grade and stage of tumor. The ability to recognize specific CAF markers using WGCNA is comparable to single-cell sequencing.

Download Full-text

Computational Methods for Single-Cell Data Analysis

10.1007/978-1-4939-9057-3 ◽

2019 ◽

Keyword(s):

Data Analysis ◽

Single Cell ◽

Computational Methods ◽

Cell Data

Download Full-text

PhyDOSE: Design of Follow-up Single-cell Sequencing Experiments of Tumors

10.1101/2020.03.30.016410 ◽

2020 ◽

Author(s):

Leah Weber ◽

Nuraini Aguse ◽

Nicholas Chia ◽

Mohammed El-Kebir

Keyword(s):

Single Cell ◽

Retrospective Analysis ◽

High Fidelity ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Bulk Data ◽

Sequencing Experiment ◽

Tumor Phylogeny ◽

Number Of Cells

AbstractThe combination of bulk and single-cell DNA sequencing data of the same tumor enables the inference of high-fidelity phylogenies that form the input to many important downstream analyses in cancer genomics. While many studies simultaneously perform bulk and single-cell sequencing, some studies have analyzed initial bulk data to identify which mutations to target in a follow-up single-cell sequencing experiment, thereby decreasing cost. Bulk data provide an additional untapped source of valuable information, composed of candidate phylogenies and associated clonal prevalence. Here, we introduce PhyDOSE, a method that uses this information to strategically optimize the design of follow-up single cell experiments. Underpinning our method is the observation that only a small number of clones uniquely distinguish one candidate tree from all other trees. We incorporate distinguishing features into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a leukemia patient, concluding that PhyDOSE’s computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential to achieve similar results with a significant reduction in the number of cells sequenced. In a prospective analysis, we demonstrate that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. In summary, PhyDOSE proposes cost-efficient single-cell sequencing experiments that yield high-fidelity phylogenies, which will improve downstream analyses aimed at deepening our understanding of cancer biology.Author summaryCancer development in a patient can be explained using a phylogeny — a tree that describes the evolutionary history of a tumor and has therapeutic implications. A tumor phylogeny is constructed from sequencing data, commonly obtained using either bulk or single-cell DNA sequencing technology. The accuracy of tumor phylogeny inference increases when both types of data are used, but single-cell sequencing may become prohibitively costly with increasing number of cells. Here, we propose a method that uses bulk sequencing data to guide the design of a follow-up single-cell sequencing experiment. Our results suggest that PhyDOSE provides a significant decrease in the number of cells to sequence compared to the number of cells sequenced in existing studies. The ability to make informed decisions based on prior data can help reduce the cost of follow-up single cell sequencing experiments of tumors, improving accuracy of tumor phylogeny inference and ultimately getting us closer to understanding and treating cancer.

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

GPPS: an ILP-based approach for inferring cancer progression with mutation losses from single cell data

2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2018.8542058 ◽

2018 ◽

Cited By ~ 2

Author(s):

Simone Ciccolella ◽

Mauricio Soto Gomez ◽

Murray Patterson ◽

Gianluca Della Vedova ◽

Iman Hajirasouliha ◽

...

Keyword(s):

Single Cell ◽

Cancer Progression ◽

Cell Data

Download Full-text

XenoCell: classification of cellular barcodes in single cell experiments from xenograft samples

BMC Medical Genomics ◽

10.1186/s12920-021-00872-8 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Stefano Cheloni ◽

Roman Hillje ◽

Lucilla Luzi ◽

Pier Giuseppe Pelicci ◽

Elena Gatti

Keyword(s):

Single Cell ◽

Open Source ◽

Cell Line ◽

Mixed Species ◽

Single Cell Sequencing ◽

Sequencing Technologies ◽

Cell Experiment ◽

Reliable Classification ◽

Bioinformatics Workflows

Abstract Background Single-cell sequencing technologies provide unprecedented opportunities to deconvolve the genomic, transcriptomic or epigenomic heterogeneity of complex biological systems. Its application in samples from xenografts of patient-derived biopsies (PDX), however, is limited by the presence of cells originating from both the host and the graft in the analysed samples; in fact, in the bioinformatics workflows it is still a challenge discriminating between host and graft sequence reads obtained in a single-cell experiment. Results We have developed XenoCell, the first stand-alone pre-processing tool that performs fast and reliable classification of host and graft cellular barcodes from single-cell sequencing experiments. We show its application on a mixed species 50:50 cell line experiment from 10× Genomics platform, and on a publicly available PDX dataset obtained by Drop-Seq. Conclusions XenoCell accurately dissects sequence reads from any host and graft combination of species as well as from a broad range of single-cell experiments and platforms. It is open source and available at https://gitlab.com/XenoCell/XenoCell.

Download Full-text