SCOTCluster: Deep Clustering with Optimal Transport for Large-scale Single-cell RNA-seq Data

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.10.003 ◽

2019 ◽

Vol 17 (2) ◽

pp. 201-210 ◽

Cited By ~ 5

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Large Scale ◽

Rna Seq ◽

Computational Framework

Download Full-text

Abstract 5297: LCA: A robust and scalable algorithm to reveal subtle diversity in large-scale single-cell RNA-Seq data

10.1158/1538-7445.am2018-5297 ◽

2018 ◽

Author(s):

Changde Cheng ◽

John Easton ◽

Celeste Rosencrance ◽

Yan Li ◽

Bensheng Ju ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Rna Seq ◽

Scalable Algorithm

Download Full-text

An Automated Microwell Platform for Large-Scale Single Cell RNA-Seq

Scientific Reports ◽

10.1038/srep33883 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 59

Author(s):

Jinzhou Yuan ◽

Peter A. Sims

Keyword(s):

Single Cell ◽

Large Scale ◽

Rna Seq

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

10.1101/2020.06.15.151910 ◽

2020 ◽

Author(s):

Van Hoan Do ◽

Francisca Rojas Ringeling ◽

Stefan Canzar

Keyword(s):

Single Cell ◽

Large Scale ◽

Linear Time ◽

Cell Types ◽

Substantial Improvement ◽

Rna Seq ◽

Sampling Step ◽

Protein Marker ◽

Cluster Ensembles ◽

Sequencing Technologies

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.

Download Full-text

Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape

10.1101/397042 ◽

2018 ◽

Cited By ~ 21

Author(s):

Jong-Eun Park ◽

Krzysztof Polański ◽

Kerstin Meyer ◽

Sarah A. Teichmann

Keyword(s):

Data Integration ◽

Single Cell ◽

Large Scale ◽

Integration Method ◽

Mouse Cell ◽

Rna Seq ◽

Computational Tools ◽

Integrative Study ◽

Data Explosion ◽

Computationally Intensive

AbstractIncreasing numbers of large scale single cell RNA-Seq projects are leading to a data explosion, which can only be fully exploited through data integration. Therefore, efficient computational tools for combining diverse datasets are crucial for biology in the single cell genomics era. A number of methods have been developed to assist data integration by removing technical batch effects, but most are computationally intensive. To overcome the challenge of enormous datasets, we have developed BBKNN, an extremely fast graph-based data integration method. We illustrate the power of BBKNN for dimensionalityreduced visualisation and clustering in multiple biological scenarios, including a massive integrative study over several murine atlases. BBKNN successfully connects cell populations across experimentally heterogeneous mouse scRNA-Seq datasets, which reveals global markers of cell type and organspecificity and provides the foundation for inferring the underlying transcription factor network. BBKNN is available at https://github.com/Teichlab/bbknn.

Download Full-text