Stability of single-cell dimension reduction after data shuffling

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.

Download Full-text

VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder

10.1101/199315 ◽

2017 ◽

Cited By ~ 6

Author(s):

Dongfang Wang ◽

Jin Gu

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Original Data ◽

Marker Genes ◽

Single Cell Level ◽

Sequencing Data ◽

Cell Level ◽

Variational Autoencoder ◽

Single Cell Rna Sequencing

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities in single cell level. It is an important step for studying cell sub-populations and lineages based on scRNA-seq data by finding an effective low-dimensional representation and visualization of the original data. The scRNA-seq data are much noiser than traditional bulk RNA-Seq: in the single cell level, the transcriptional fluctuations are much larger than the average of a cell population and the low amount of RNA transcripts will increase the rate of technical dropout events. In this study, we proposed VASC (deep Variational Autoencoder for scRNA-seq data), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. It can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on twenty datasets, VASC shows superior performances in most cases and broader dataset compatibility compared with four state-of-the-art dimension reduction methods. Then, for a case study of pre-implantation embryos, VASC successfully re-establishes the cell dynamics and identifies several candidate marker genes associated with the early embryo development.

Download Full-text

Exploring dimension-reduced embeddings with Sleepwalk

10.1101/603589 ◽

2019 ◽

Author(s):

Svetlana Ovchinnikova ◽

Simon Anders

Keyword(s):

Big Data ◽

Dimension Reduction ◽

Single Cell ◽

Single Cells ◽

High Dimensional ◽

Rna Seq ◽

Mouse Cursor ◽

Sample Data ◽

Reduction Methods ◽

Full Power

AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-seq data for many single cells. However, dimension reduction is commonly prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict original or any other distances from all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleep-walk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-seq but also in any other area with matrix-shaped big data.

Download Full-text

A robust nonlinear low-dimensional manifold for single cell RNA-seq data

10.1101/443044 ◽

2018 ◽

Cited By ~ 5

Author(s):

Archit Verma ◽

Barbara E. Engelhardt

Keyword(s):

Gaussian Process ◽

Dimension Reduction ◽

Single Cell ◽

Latent Variable ◽

Developmental Trajectories ◽

Latent Variable Model ◽

Variable Model ◽

Sequencing Technologies ◽

Heavy Tailed ◽

Low Dimensional

AbstractModern developments in single cell sequencing technologies enable broad insights into cellular state. Single cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden understanding of cell heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single cell data. However, methods have yet to be developed for unfiltered and unnormalized count data. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student’s t-distribution to estimate a manifold that is robust to technical and biological noise. We compare our approach to common dimension reduction tools to highlight our model’s ability to enable important downstream tasks, including clustering and inferring cell developmental trajectories, on available experimental data. We show that our robust nonlinear manifold is well suited for raw, unfiltered gene counts from high throughput sequencing technologies for visualization and exploration of cell states.

Download Full-text

Dimension Reduction and Clustering of Single Cell Calcium Spiking: Comparison of t-SNE and UMAP

10.1109/ncc52529.2021.9530128 ◽

2021 ◽

Author(s):

Suman Gare ◽

Soumita Chel ◽

Manohar Kuruba ◽

Soumya Jana ◽

Lopamudra Giri

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Cell Calcium ◽

Calcium Spiking

Download Full-text

VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.08.003 ◽

2018 ◽

Vol 16 (5) ◽

pp. 320-331 ◽

Cited By ~ 46

Author(s):

Dongfang Wang ◽

Jin Gu

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Seq ◽

Variational Autoencoder

Download Full-text

Comparative Research of Different Dimension Reduction Methods Combined with RWR Network Smoothing in Single Cell RNA-seq Data

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/495/1/012043 ◽

2020 ◽

Vol 495 ◽

pp. 012043

Author(s):

Xuesong Xiao ◽

Pengchao Ye ◽

Wenbin Ye ◽

Guoli Ji

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Comparative Research ◽

Rna Seq ◽

Reduction Methods

Download Full-text

Processing single-cell RNA-seq data for dimension reduction-based analyses using open-source tools

STAR Protocols ◽

10.1016/j.xpro.2021.100450 ◽

2021 ◽

Vol 2 (2) ◽

pp. 100450

Author(s):

Bob Chen ◽

Marisol A. Ramirez-Solano ◽

Cody N. Heiser ◽

Qi Liu ◽

Ken S. Lau

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Open Source ◽

Rna Seq

Download Full-text

SHARP: Single-cell RNA-seq Hyper-fast and Accurate Processing via Ensemble Random Projection

10.1101/461640 ◽

2018 ◽

Cited By ~ 2

Author(s):

Shibiao Wan ◽

Junil Kim ◽

Kyoung Jae Won

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Running Speed ◽

Large Size ◽

Single Cell Rna Sequencing ◽

Speed And Accuracy

ABSTRACTTo process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size datasets (>40,000 cells), SHARP’s running speed far excels other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.

Download Full-text