Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

Mapping Intimacies ◽

10.1101/132183 ◽

2017 ◽

Cited By ~ 3

Author(s):

Daniele Ramazzotti ◽

Alex Graudenzi ◽

Luca De Sano ◽

Marco Antoniotti ◽

Giulio Caravagna

Keyword(s):

Single Cell ◽

Sequencing Data ◽

Data Types ◽

Tumour Heterogeneity ◽

Computational Framework ◽

Statistical Framework ◽

Somatic Alterations ◽

Multiple Samples ◽

Single Tumour ◽

Individual Tumour

AbstractBackgroundA large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types.ResultsWe introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods.ConclusionsWe show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses.

A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

10.1101/2021.07.10.451910 ◽

2021 ◽

Author(s):

Wenpin Hou ◽

Zhicheng Ji ◽

Zeyu Chen ◽

E John Wherry ◽

Stephanie C Hicks ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Biological Processes ◽

Rna Seq ◽

Experimental Conditions ◽

Computational Framework ◽

Statistical Framework ◽

Gene Regulatory ◽

Multiple Samples ◽

False Discoveries

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns with multiple samples (or replicates) across different experimental conditions are lacking. Lamian is a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. It can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions, and also to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for cross-sample variability and hence substantially reduces sample-specific false discoveries that are not generalizable to new samples. Using both simulations and real scRNA-seq data, including an analysis of differential immune response programs between COVID-19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.

On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data

10.1101/713412 ◽

2019 ◽

Cited By ~ 25

Author(s):

Helena L. Crowell ◽

Charlotte Soneson ◽

Pierre-Luc Germain ◽

Daniela Calini ◽

Ludovic Collin ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

R Package ◽

State Transitions ◽

Sequencing Data ◽

Statistical Framework ◽

Single Cell Rna Sequencing ◽

Cell Subpopulations ◽

Multiple Samples

AbstractSingle-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of individual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single sample or across multiple samples. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis. For example, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition; however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-sample scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data

10.1101/2020.10.07.330563 ◽

2020 ◽

Author(s):

Bobby Ranjan ◽

Wenjie Sun ◽

Jinyu Park ◽

Ronald Xie ◽

Fatemeh Alipour ◽

...

Keyword(s):

Feature Selection ◽

Single Cell ◽

Gene Selection ◽

Marker Gene ◽

Feature Space ◽

General Purpose ◽

Selection Marker ◽

Selection Methods ◽

Sequencing Data ◽

Data Types

Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. However, we found that the performance of existing feature selection methods was inconsistent across benchmark datasets, and occasionally even worse than without feature selection. Moreover, existing methods ignored information contained in gene-gene correlations. We there-fore developed DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUB-StepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. In a published scRNA-seq dataset from sorted monocytes, DUBStepR sensitively detected a rare and previously invisible population of contaminating basophils. DUBStepR is scalable to large datasets, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.

A statistical test on single-cell data reveals widespread recurrent mutations in tumor evolution

10.1101/094722 ◽

2016 ◽

Cited By ~ 3

Author(s):

Jack Kuipers ◽

Katharina Jahn ◽

Benjamin J. Raphael ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Large Scale ◽

Tumor Evolution ◽

Sequencing Data ◽

General Validity ◽

Genomic Deletions ◽

Single Cell Sequencing ◽

Statistical Framework ◽

Recurrent Mutations ◽

Complex Models

The infinite sites assumption, which states that every genomic position mutates at most once over the lifetime of a tumor, is central to current approaches for reconstructing mutation histories of tumors, but has never been tested explicitly. We developed a rigorous statistical framework to test the assumption with single-cell sequencing data. The framework accounts for the high noise and contamination present in such data. We found strong evidence for recurrent mutations at the same site in 8 out of 9 single-cell sequencing datasets from human tumors. Six cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large scale genomic deletions. Two cases exhibited parallel mutation, including the dataset with the strongest evidence of recurrence. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity.

SMaSH: A scalable, general marker gene identification framework for single-cell RNA sequencing and Spatial Transcriptomics

10.1101/2021.04.08.438978 ◽

2021 ◽

Author(s):

Michael E Nelson ◽

Simone G Riva ◽

Ann Cvejic

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Marker Genes ◽

Sequencing Data ◽

Computational Framework ◽

Data Set ◽

Spatially Resolved ◽

Single Cell Rna Sequencing ◽

The Given

Spatial transcriptomics is revolutionising the study of single-cell RNA and tissue-wide cell heterogeneity, but few robust methods connecting spatially resolved cells to so-called marker genes from single-cell RNA sequencing, which generate significant insight gleaned from spatial methods, exist. Here we present SMaSH, a general computational framework for extracting key marker genes from single-cell RNA sequencing data for spatial transcriptomics approaches. SMaSH extracts robust and biologically well-motivated marker genes, which characterise the given data-set better than existing and limited computational approaches for global marker gene calculation.

Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

BMC Bioinformatics ◽

10.1186/s12859-019-2795-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 8

Author(s):

Daniele Ramazzotti ◽

Alex Graudenzi ◽

Luca De Sano ◽

Marco Antoniotti ◽

Giulio Caravagna

Keyword(s):

Single Cell ◽

Sequencing Data ◽

Individual Tumour

SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data

10.1101/2020.02.03.930354 ◽

2020 ◽

Author(s):

Collin Giguere ◽

Harsh Vardhan Dubey ◽

Vishal Kumar Sarsani ◽

Hachem Saddiki ◽

Shai He ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Real Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Next Generation Dna Sequencing ◽

Accuracy And Precision ◽

Downstream Analysis ◽

Multiple Samples

AbstractBackgroundRecently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.ResultsWe have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.ConclusionsThe DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.

MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data

Genome Biology ◽

10.1186/s13059-015-0844-5 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 619

Author(s):

Greg Finak ◽

Andrew McDavid ◽

Masanao Yajima ◽

Jingyuan Deng ◽

Vivian Gersuk ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Statistical Framework ◽

Single Cell Rna Sequencing ◽

Transcriptional Changes

Combinatorial prediction of marker panels from single-cell transcriptomic data

10.1101/655753 ◽

2019 ◽

Cited By ~ 1

Author(s):

Conor Delaney ◽

Alexandra Schnell ◽

Louis V. Cammarata ◽

Aaron Yao-Smith ◽

Aviv Regev ◽

...

Keyword(s):

Single Cell ◽

Single Gene ◽

Cell Populations ◽

Gene Marker ◽

Sequencing Data ◽

Functional Roles ◽

Link Type ◽

Statistical Framework ◽

Gene Panels

AbstractSingle-cell transcriptomic studies are identifying novel cell populations with exciting functional roles in various in vivo contexts, but identification of succinct gene-marker panels for such populations remains a challenge. In this work we introduce COMET, a computational framework for the identification of candidate marker panels consisting of one or more genes for cell populations of interest identified with single-cell RNA-seq data. We show that COMET outperforms other methods for the identification of single-gene panels, and enables, for the first time, prediction of multi-gene marker panels ranked by relevance. Staining by flow-cytometry assay confirmed the accuracy of COMET’s predictions in identifying marker-panels for cellular subtypes, at both the single- and multi-gene levels, validating COMET’s applicability and accuracy in predicting favorable marker-panels from transcriptomic input. COMET is a general non-parametric statistical framework and can be used as-is on various high-throughput datasets in addition to single-cell RNA-sequencing data. COMET is available for use via a web interface (http://www.cometsc.com) or a standalone software package (https://github.com/MSingerlab/COMETSC).