Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.

Download Full-text

SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data

10.1101/2020.02.03.930354 ◽

2020 ◽

Author(s):

Collin Giguere ◽

Harsh Vardhan Dubey ◽

Vishal Kumar Sarsani ◽

Hachem Saddiki ◽

Shai He ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Real Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Next Generation Dna Sequencing ◽

Accuracy And Precision ◽

Downstream Analysis ◽

Multiple Samples

AbstractBackgroundRecently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.ResultsWe have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.ConclusionsThe DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

Improved Compression of DNA Sequencing Data with Cascading Bloom Filters

International Journal of Foundations of Computer Science ◽

10.1142/s0129054118430013 ◽

2018 ◽

Vol 29 (08) ◽

pp. 1249-1255

Author(s):

Kamil Salikhov

Keyword(s):

Dna Sequencing ◽

Sequence Data ◽

Real Data ◽

Compression Algorithm ◽

Computational Experiments ◽

Bloom Filters ◽

Dna Fragments ◽

Sequencing Data ◽

Sequencing Technologies ◽

Memory Reduction

Modern DNA sequencing technologies generate prodigious volumes of sequence data consisting of short DNA fragments (reads). Storing and transferring this data is often challenging. With this motivation, several specialized compression methods have been developed. In this paper, we present an improvement of the lossless reference-free compression algorithm, suggested by Rozov et al., based on the technique of cascading Bloom filters. Through computational experiments on real data, we demonstrate that our method results in a significant associated memory reduction in practice.

Download Full-text

Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008012 ◽

2020 ◽

Vol 16 (7) ◽

pp. e1008012 ◽

Cited By ~ 2

Author(s):

Xian F. Mallory ◽

Mohammadamin Edrisi ◽

Nicholas Navin ◽

Luay Nakhleh

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Copy Number ◽

Copy Number Aberration ◽

Sequencing Data ◽

Aberration Detection

Download Full-text

scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-020-3381-5 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Zhiqiang Yan ◽

Xiaohui Zhu ◽

Yuqian Wang ◽

Yanli Nie ◽

Shuo Guan ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Genetic Diagnosis ◽

Sequencing Data

Download Full-text

Linked-read analysis identifies mutations in single-cell DNA-sequencing data

Nature Genetics ◽

10.1038/s41588-019-0366-2 ◽

2019 ◽

Vol 51 (4) ◽

pp. 749-754 ◽

Cited By ~ 25

Author(s):

Craig L. Bohrson ◽

Alison R. Barton ◽

Michael A. Lodato ◽

Rachel E. Rodin ◽

Lovelace J. Luquette ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Sequencing Data

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Single-cell copy number calling and event history reconstruction

10.1101/2020.04.28.065755 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack Kuipers ◽

Mustafa Anıl Tuncel ◽

Pedro Ferreira ◽

Katharina Jahn ◽

Niko Beerenwinkel

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Copy Number ◽

Driving Forces ◽

Simulated Data ◽

Read Depth ◽

Cancer Diagnostics ◽

Whole Genome ◽

Copy Number Alterations ◽

Sequencing Data

Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations. We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to a xenograft breast cancer sample.

Download Full-text

A Bayesian Nonparametric Model for Inferring Subclonal Populations from Structured DNA Sequencing Data

10.1101/2020.11.10.330183 ◽

2020 ◽

Author(s):

Shai He ◽

Aaron Schein ◽

Vishal Sarsani ◽

Patrick Flaherty

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Dirichlet Process ◽

Lymphoblastic Leukemia ◽

Nonparametric Model ◽

Dirichlet Process Mixture ◽

Sequencing Data ◽

Hierarchical Dirichlet Process ◽

Dirichlet Process Prior

There are distinguishing features or “hallmarks” of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment.We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.

Download Full-text

MQuad enables clonal substructure discovery using single cell mitochondrial variants

10.1101/2021.03.27.437331 ◽

2021 ◽

Author(s):

Aaron Wing Cheung Kwok ◽

Chen Qiao ◽

Rongting Huang ◽

Mai-Har Sham ◽

Joshua W. K. Ho ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Copy Number Variations ◽

Sequencing Data ◽

Single Nucleotide ◽

Single Cell Sequencing ◽

Mtdna Variants ◽

Python Package

AbstractMitochondrial mutations are increasingly recognised as informative endogenous genetic markers that can be used to reconstruct cellular clonal structure using single-cell RNA or DNA sequencing data. However, there is a lack of effective computational methods to identify informative mtDNA variants in noisy and sparse single-cell sequencing data. Here we present an open source computational tool MQuad that accurately calls clonally informative mtDNA variants in a population of single cells, and an analysis suite for complete clonality inference, based on single cell RNA or DNA sequencing data. Through a variety of simulated and experimental single cell sequencing data, we showed that MQuad can identify mitochondrial variants with both high sensitivity and specificity, outperforming existing methods by a large extent. Furthermore, we demonstrated its wide applicability in different single cell sequencing protocols, particularly in complementing single-nucleotide and copy-number variations to extract finer clonal resolution. MQuad is a Python package available via https://github.com/single-cell-genetics/MQuad.

Download Full-text