scholarly journals VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder

2018 ◽  
Vol 16 (5) ◽  
pp. 320-331 ◽  
Author(s):  
Dongfang Wang ◽  
Jin Gu
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.


2019 ◽  
Author(s):  
Valentine Svensson ◽  
Lior Pachter

Single cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be of interest for other applications.


2017 ◽  
Author(s):  
Dongfang Wang ◽  
Jin Gu

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities in single cell level. It is an important step for studying cell sub-populations and lineages based on scRNA-seq data by finding an effective low-dimensional representation and visualization of the original data. The scRNA-seq data are much noiser than traditional bulk RNA-Seq: in the single cell level, the transcriptional fluctuations are much larger than the average of a cell population and the low amount of RNA transcripts will increase the rate of technical dropout events. In this study, we proposed VASC (deep Variational Autoencoder for scRNA-seq data), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. It can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on twenty datasets, VASC shows superior performances in most cases and broader dataset compatibility compared with four state-of-the-art dimension reduction methods. Then, for a case study of pre-implantation embryos, VASC successfully re-establishes the cell dynamics and identifies several candidate marker genes associated with the early embryo development.


2019 ◽  
Author(s):  
Svetlana Ovchinnikova ◽  
Simon Anders

AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-seq data for many single cells. However, dimension reduction is commonly prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict original or any other distances from all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleep-walk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-seq but also in any other area with matrix-shaped big data.


2020 ◽  
Vol 36 (11) ◽  
pp. 3418-3421 ◽  
Author(s):  
Valentine Svensson ◽  
Adam Gayoso ◽  
Nir Yosef ◽  
Lior Pachter

Abstract Motivation Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. Results We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. Availability and implementation The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 2 (2) ◽  
pp. 100450
Author(s):  
Bob Chen ◽  
Marisol A. Ramirez-Solano ◽  
Cody N. Heiser ◽  
Qi Liu ◽  
Ken S. Lau

2018 ◽  
Author(s):  
Shibiao Wan ◽  
Junil Kim ◽  
Kyoung Jae Won

ABSTRACTTo process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size datasets (>40,000 cells), SHARP’s running speed far excels other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.


2019 ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization pro-cedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We pro-pose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.


Sign in / Sign up

Export Citation Format

Share Document