Author Correction: Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.

Download Full-text

Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model

10.1101/574574 ◽

2019 ◽

Cited By ~ 29

Author(s):

F. William Townes ◽

Stephanie C. Hicks ◽

Martin J. Aryee ◽

Rafael A. Irizarry

Keyword(s):

Feature Selection ◽

Dimension Reduction ◽

Single Cell ◽

Current Practice ◽

Principal Component ◽

Ground Truth ◽

Rna Seq ◽

Normal Distributions ◽

Multinomial Sampling ◽

Negative Controls

AbstractSingle cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization pro-cedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We pro-pose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.

Download Full-text

FEATS: Feature selection based clustering of single-cell RNA-seq data

10.1101/2020.07.13.200485 ◽

2020 ◽

Author(s):

Edwin Vans ◽

Ashwini Patil ◽

Alok Sharma

Keyword(s):

Feature Selection ◽

Single Cell ◽

Outlier Detection ◽

Cell Types ◽

Superior Performance ◽

Adjusted Rand Index ◽

Marker Genes ◽

Rna Seq ◽

Number Of Clusters ◽

Computational Performance

ABSTRACTAdvances in next-generation sequencing (NGS) have made it possible to carry out transcriptomic studies at single-cell resolution and generate vast amounts of single-cell RNA-seq data rapidly. Thus, tools to analyze this data need to evolve as well to improve accuracy and efficiency. We present FEATS, a python software package that performs clustering on single-cell RNA-seq data. FEATS is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection, and integrating data from various experiments. We develop a univariate feature selection based approach for clustering, which involves the selection of top informative features to improve clustering performance. This is motivated by the fact that cell types are often manually determined using the expression of only a few known marker genes. On a variety of single-cell RNA-seq datasets, FEATS gives superior performance compared to the current tools, in terms of adjusted rand index (ARI) and estimating the number of clusters. In addition to cluster estimation, FEATS also performs outlier detection and data integration while giving an excellent computational performance. Thus, FEATS is a comprehensive clustering tool capable of addressing the challenges during the clustering of single-cell RNA-seq data. The installation instructions and documentation of FEATS is available at https://edwinv87.github.io/feats/.

Download Full-text

Exploring dimension-reduced embeddings with Sleepwalk

10.1101/603589 ◽

2019 ◽

Author(s):

Svetlana Ovchinnikova ◽

Simon Anders

Keyword(s):

Big Data ◽

Dimension Reduction ◽

Single Cell ◽

Single Cells ◽

High Dimensional ◽

Rna Seq ◽

Mouse Cursor ◽

Sample Data ◽

Reduction Methods ◽

Full Power

AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-seq data for many single cells. However, dimension reduction is commonly prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict original or any other distances from all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleep-walk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-seq but also in any other area with matrix-shaped big data.

Download Full-text

An Interpretable Framework for Clustering Single-Cell RNA-Seq Datasets

10.1101/191254 ◽

2017 ◽

Author(s):

Jesse M. Zhang ◽

Jue Fan ◽

H. Christina Fan ◽

David Rosenfeld ◽

David N. Tse

Keyword(s):

Feature Selection ◽

Single Cell ◽

Computational Efficiency ◽

Software Package ◽

Rna Seq ◽

Cell Type ◽

Clustering Problem ◽

Unsupervised Analysis ◽

Multiple Levels ◽

Definition Of

ABSTRACTBackgroundWith the recent proliferation of single-cell RNA-Seq experiments, several methods have been developed for unsupervised analysis of the resulting datasets. These methods often rely on unintuitive hyperparameters and do not explicitly address the subjectivity associated with clustering.ResultsIn this work, we present DendroSplit, an interpretable framework for analyzing single-cell RNA-Seq datasets that addresses both the clustering interpretability and clustering subjectivity issues. DendroSplit offers a novel perspective on the single-cell RNA-Seq clustering problem motivated by the definition of “cell type,” allowing us to cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. We analyze several landmark single-cell datasets, demonstrating both the method’s efficacy and computational efficiency.ConclusionDendroSplit offers a clustering framework that is comparable to existing methods in terms of accuracy and speed but is novel in its emphasis on interpretabilty. We provide the full DendroSplit software package at https://github.com/jessemzhang/dendrosplit.

Download Full-text

VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.08.003 ◽

2018 ◽

Vol 16 (5) ◽

pp. 320-331 ◽

Cited By ~ 46

Author(s):

Dongfang Wang ◽

Jin Gu

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Seq ◽

Variational Autoencoder

Download Full-text

Comparative Research of Different Dimension Reduction Methods Combined with RWR Network Smoothing in Single Cell RNA-seq Data

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/495/1/012043 ◽

2020 ◽

Vol 495 ◽

pp. 012043

Author(s):

Xuesong Xiao ◽

Pengchao Ye ◽

Wenbin Ye ◽

Guoli Ji

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Comparative Research ◽

Rna Seq ◽

Reduction Methods

Download Full-text

Processing single-cell RNA-seq data for dimension reduction-based analyses using open-source tools

STAR Protocols ◽

10.1016/j.xpro.2021.100450 ◽

2021 ◽

Vol 2 (2) ◽

pp. 100450

Author(s):

Bob Chen ◽

Marisol A. Ramirez-Solano ◽

Cody N. Heiser ◽

Qi Liu ◽

Ken S. Lau

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Open Source ◽

Rna Seq

Download Full-text

SHARP: Single-cell RNA-seq Hyper-fast and Accurate Processing via Ensemble Random Projection

10.1101/461640 ◽

2018 ◽

Cited By ~ 2

Author(s):

Shibiao Wan ◽

Junil Kim ◽

Kyoung Jae Won

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Running Speed ◽

Large Size ◽

Single Cell Rna Sequencing ◽

Speed And Accuracy

ABSTRACTTo process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size datasets (>40,000 cells), SHARP’s running speed far excels other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.

Download Full-text

Analysis of Single-Cell RNA-seq Data by Clustering Approaches

Current Bioinformatics ◽

10.2174/1574893614666181120095038 ◽

2019 ◽

Vol 14 (4) ◽

pp. 314-322 ◽

Cited By ~ 3

Author(s):

Xiaoshu Zhu ◽

Hong-Dong Li ◽

Lilu Guo ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

Feature Selection ◽

Single Cell ◽

Cell Types ◽

Semisupervised Learning ◽

Similarity Measurement ◽

Marker Genes ◽

Rna Seq ◽

Selection Methods ◽

Clustering Methods ◽

Similarity Calculation

Background: The recently developed single-cell RNA sequencing (scRNA-seq) has attracted a great amount of attention due to its capability to interrogate expression of individual cells, which is superior to traditional bulk cell sequencing that can only measure mean gene expression of a population of cells. scRNA-seq has been successfully applied in finding new cell subtypes. New computational challenges exist in the analysis of scRNA-seq data. Objective: We provide an overview of the features of different similarity calculation and clustering methods, in order to facilitate users to select methods that are suitable for their scRNA-seq. We would also like to show that feature selection methods are important to improve clustering performance. Results: We first described similarity measurement methods, followed by reviewing some new clustering methods, as well as their algorithmic details. This analysis revealed several new questions, including how to automatically estimate the number of clustering categories, how to discover novel subpopulation, and how to search for new marker genes by using feature selection methods. Conclusion: Without prior knowledge about the number of cell types, clustering or semisupervised learning methods are important tools for exploratory analysis of scRNA-seq data.</P>

Download Full-text