SHARP: Single-cell RNA-seq Hyper-fast and Accurate Processing via Ensemble Random Projection

ABSTRACTTo process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size datasets (>40,000 cells), SHARP’s running speed far excels other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Comparison of Scanpy-based algorithms to remove the batch effect from single-cell RNA-seq data

Cell Regeneration ◽

10.1186/s13619-020-00041-9 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Jiaqi Li ◽

Chengxuan Yu ◽

Lifeng Ma ◽

Jingjing Wang ◽

Guoji Guo

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Batch Effect ◽

Rna Seq ◽

Batch Effects ◽

Integration Methods ◽

Batch Correction ◽

Single Cell Rna Sequencing ◽

Algorithm Level

AbstractWith the development of single-cell RNA sequencing (scRNA-seq) technology, analysts need to integrate hundreds of thousands of cells with multiple experimental batches. It is becoming increasingly difficult for users to select the best integration methods to remove batch effects. Here, we compared the advantages and limitations of four commonly used Scanpy-based batch-correction methods using two representative and large-scale scRNA-seq datasets. We quantitatively evaluated batch-correction performance and efficiency. Furthermore, we discussed the performance differences among the evaluated methods at the algorithm level.

Download Full-text

Vacuum-Driven Micropump with Support Columns: Toward Large Scale Single-Cell RNA-Sequencing

2018 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS) ◽

10.1109/marss.2018.8481157 ◽

2018 ◽

Author(s):

Kento Hisa ◽

Musashi Kakugawa ◽

Takayuki Shibata ◽

Moeto Nagai

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Single Cell Rna Sequencing

Download Full-text

Assessing the measurement transfer function of single-cell RNA sequencing

10.1101/045450 ◽

2016 ◽

Author(s):

Hannah R. Dueck ◽

Rizi Ai ◽

Adrian Camarena ◽

Bo Ding ◽

Reymundo Dominguez ◽

...

Keyword(s):

Transfer Function ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Transfer Functions ◽

Single Cells ◽

Biological Information ◽

Gene Detection ◽

Single Cell Rna Sequencing ◽

Scale Control

AbstractRecently, measurement of RNA at single cell resolution has yielded surprising insights. Methods for single-cell RNA sequencing (scRNA-seq) have received considerable attention, but the broad reliability of single cell methods and the factors governing their performance are still poorly known. Here, we conducted a large-scale control experiment to assess the transfer function of three scRNA-seq methods and factors modulating the function. All three methods detected greater than 70% of the expected number of genes and had a 50% probability of detecting genes with abundance greater than 2 to 4 molecules. Despite the small number of molecules, sequencing depth significantly affected gene detection. While biases in detection and quantification were qualitatively similar across methods, the degree of bias differed, consistent with differences in molecular protocol. Measurement reliability increased with expression level for all methods and we conservatively estimate the measurement transfer functions to be linear above ~5-10 molecules. Based on these extensive control studies, we propose that RNA-seq of single cells has come of age, yielding quantitative biological information.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing

Nature Neuroscience ◽

10.1038/nn.3881 ◽

2014 ◽

Vol 18 (1) ◽

pp. 145-153 ◽

Cited By ~ 849

Author(s):

Dmitry Usoskin ◽

Alessandro Furlan ◽

Saiful Islam ◽

Hind Abdo ◽

Peter Lönnerberg ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sensory Neuron ◽

Large Scale ◽

Single Cell Rna Sequencing

Download Full-text

A map of tumor–host interactions in glioma at single-cell resolution

GigaScience ◽

10.1093/gigascience/giaa109 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 3

Author(s):

Francesca Pia Caruso ◽

Luciano Garofano ◽

Fulvio D'Angelo ◽

Kai Yu ◽

Fuchou Tang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cross Talk ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

Host Interaction ◽

Receptor Interactions ◽

Single Cell Rna Sequencing

ABSTRACT Background Single-cell RNA sequencing is the reference technique for characterizing the heterogeneity of the tumor microenvironment. The composition of the various cell types making up the microenvironment can significantly affect the way in which the immune system activates cancer rejection mechanisms. Understanding the cross-talk signals between immune cells and cancer cells is of fundamental importance for the identification of immuno-oncology therapeutic targets. Results We present a novel method, single-cell Tumor–Host Interaction tool (scTHI), to identify significantly activated ligand–receptor interactions across clusters of cells from single-cell RNA sequencing data. We apply our approach to uncover the ligand–receptor interactions in glioma using 6 publicly available human glioma datasets encompassing 57,060 gene expression profiles from 71 patients. By leveraging this large-scale collection we show that unexpected cross-talk partners are highly conserved across different datasets in the majority of the tumor samples. This suggests that shared cross-talk mechanisms exist in glioma. Conclusions Our results provide a complete map of the active tumor–host interaction pairs in glioma that can be therapeutically exploited to reduce the immunosuppressive action of the microenvironment in brain tumor.

Download Full-text

A bivariate zero-inflated negative binomial model for identifying underlying dependence with application to single cell RNA sequencing data

10.1101/2020.03.06.977728 ◽

2020 ◽

Author(s):

Hunyong Cho ◽

Chuwen Liu ◽

John S. Preisser ◽

Di Wu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Latent Variable ◽

Large Scale ◽

Negative Binomial ◽

Model Fitting ◽

Sequencing Data ◽

Excess Zeros ◽

Binomial Distributions ◽

Single Cell Rna Sequencing

SummaryMeasuring gene-gene dependence in single cell RNA sequencing (scRNA-seq) count data is often of interest and remains challenging, because an unidentified portion of the zero counts represent non-detected RNA due to technical reasons. Conventional statistical methods that fail to account for technical zeros incorrectly measure the dependence among genes. To address this problem, we propose a bivariate zero-inflated negative binomial (BZINB) model constructed using a bivariate Poisson-gamma mixture with dropout indicators for the technical (excess) zeros. Parameters are estimated based on the EM algorithm and are used to measure the underlying dependence by decomposing the two sources of zeros. Compared to existing models, the proposed BZINB model is specifically designed for estimating dependence and is more flexible, while preserving the marginal zero-inflated negative binomial distributions. Additionally, it has a simple latent variable framework, allowing parameters to have clear and intuitive interpretations, and its computation is feasible with large scale data. Using a recent scRNA-seq dataset, we illustrate model fitting and how the model-based measures can be different from naive measures. The inferential ability of the proposed model is evaluated in a simulation study. An R package ‘bzinb’ is available on CRAN.

Download Full-text

VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder

10.1101/199315 ◽

2017 ◽

Cited By ~ 6

Author(s):

Dongfang Wang ◽

Jin Gu

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Original Data ◽

Marker Genes ◽

Single Cell Level ◽

Sequencing Data ◽

Cell Level ◽

Variational Autoencoder ◽

Single Cell Rna Sequencing

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities in single cell level. It is an important step for studying cell sub-populations and lineages based on scRNA-seq data by finding an effective low-dimensional representation and visualization of the original data. The scRNA-seq data are much noiser than traditional bulk RNA-Seq: in the single cell level, the transcriptional fluctuations are much larger than the average of a cell population and the low amount of RNA transcripts will increase the rate of technical dropout events. In this study, we proposed VASC (deep Variational Autoencoder for scRNA-seq data), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. It can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on twenty datasets, VASC shows superior performances in most cases and broader dataset compatibility compared with four state-of-the-art dimension reduction methods. Then, for a case study of pre-implantation embryos, VASC successfully re-establishes the cell dynamics and identifies several candidate marker genes associated with the early embryo development.

Download Full-text

Ultra-high throughput single-cell RNA sequencing by combinatorial fluidic indexing

10.1101/2019.12.17.879304 ◽

2019 ◽

Cited By ~ 4

Author(s):

Paul Datlinger ◽

André F Rendeiro ◽

Thorina Boenke ◽

Thomas Krausgruber ◽

Daniele Barreca ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Population Genomics ◽

Cost Effective ◽

Mouse Cell ◽

Droplet Microfluidics ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

Massive Scale ◽

Tcr Activation

AbstractCell atlas projects and single-cell CRISPR screens hit the limits of current technology, as they require cost-effective profiling for millions of individual cells. To satisfy these enormous throughput requirements, we developed “single-cell combinatorial fluidic indexing” (scifi) and applied it to single-cell RNA sequencing. The resulting scifi-RNA-seq assay combines one-step combinatorial pre-indexing of single-cell transcriptomes with subsequent single-cell RNA-seq using widely available droplet microfluidics. Pre-indexing allows us to load multiple cells per droplet, which increases the throughput of droplet-based single-cell RNA-seq up to 15-fold, and it provides a straightforward way of multiplexing hundreds of samples in a single scifi-RNA-seq experiment. Compared to multi-round combinatorial indexing, scifi-RNA-seq provides an easier, faster, and more efficient workflow, thereby enabling massive-scale scRNA-seq experiments for a broad range of applications ranging from population genomics to drug screens with scRNA-seq readout. We benchmarked scifi-RNA-seq on various human and mouse cell lines, and we demonstrated its feasibility for human primary material by profiling TCR activation in T cells.

Download Full-text