Ultra-fast scalable estimation of single-cell differentiation potency from scRNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa987 ◽

2020 ◽

Author(s):

Andrew E Teschendorff ◽

Alok K Maity ◽

Xue Hu ◽

Chen Weiyan ◽

Matthias Lechner

Keyword(s):

Single Cell ◽

State Of The Art ◽

Computational Cost ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Current State ◽

Multipotent Cells ◽

Comparable Accuracy

Abstract Motivation An important task in the analysis of single-cell RNA-Seq data is the estimation of differentiation potency, as this can help identify stem-or-multipotent cells in non-temporal studies or in tissues where differentiation hierarchies are not well established. A key challenge in the estimation of single-cell potency is the need for a fast and accurate algorithm, scalable to large scRNA-Seq studies profiling millions of cells. Results Here, we present a single-cell potency measure, called Correlation of Connectome and Transcriptome (CCAT), which can return accurate single-cell potency estimates of a million cells in minutes, a 100-fold improvement over current state-of-the-art methods. We benchmark CCAT against 8 other single-cell potency models and across 28 scRNA-Seq studies, encompassing over 2 million cells, demonstrating comparable accuracy than the current state-of-the-art, at a significantly reduced computational cost, and with increased robustness to dropouts. Availability and implementation CCAT is part of the SCENT R-package, freely available from https://github.com/aet21/SCENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

2DImpute: imputation in single-cell RNA-seq data from correlations in two dimensions

Bioinformatics ◽

10.1093/bioinformatics/btaa148 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3588-3589 ◽

Cited By ~ 1

Author(s):

Kaiyi Zhu ◽

Dimitris Anastassiou

Keyword(s):

Single Cell ◽

R Package ◽

Two Dimensions ◽

Imputation Method ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Expression Matrix

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CPS analysis: self-contained validation of biomedical data clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa165 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3516-3521 ◽

Cited By ~ 1

Author(s):

Lixiang Zhang ◽

Lin Lin ◽

Jia Li

Keyword(s):

Data Clustering ◽

State Of The Art ◽

R Package ◽

Research Community ◽

Supplementary Information ◽

Biomedical Data ◽

Data Generation ◽

Supplementary Data ◽

Point Set ◽

Class Labels

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

schex avoids overplotting for large single-cell RNA-sequencing datasets

Bioinformatics ◽

10.1093/bioinformatics/btz907 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2291-2292 ◽

Cited By ~ 1

Author(s):

Saskia Freytag ◽

Ryan Lister

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Per-sample standardization and asymmetric winsorization lead to accurate clustering of RNA-seq expression profiles

Bioinformatics ◽

10.1093/bioinformatics/btab091 ◽

2021 ◽

Author(s):

Davide Risso ◽

Stefano Maria Pagnotta

Keyword(s):

Single Cell ◽

Expression Profiles ◽

Unsupervised Clustering ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Data Transformations ◽

The Impact

Abstract Motivation Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear. Results Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications. Availability The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst\_analysis. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

dittoSeq: universal user-friendly single-cell and bulk RNA sequencing visualization toolkit

Bioinformatics ◽

10.1093/bioinformatics/btaa1011 ◽

2020 ◽

Author(s):

Daniel G Bunis ◽

Jared Andrews ◽

Gabriela K Fragiadakis ◽

Trevor D Burt ◽

Marina Sirota

Keyword(s):

Single Cell ◽

R Package ◽

Color Blindness ◽

Ease Of Use ◽

Supplementary Information ◽

Supplementary Data ◽

Rnaseq Data ◽

Visualization Toolkit ◽

User Friendly ◽

Publication Quality

Abstract Summary A visualization suite for major forms of bulk and single-cell RNAseq data in R. dittoSeq is color blindness-friendly by default, robustly documented to power ease-of-use and allows highly customizable generation of both daily-use and publication-quality figures. Availability and implementation dittoSeq is an R package available through Bioconductor via an open source MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

M3Drop: dropout-based feature selection for scRNASeq

Bioinformatics ◽

10.1093/bioinformatics/bty1044 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2865-2867 ◽

Cited By ~ 61

Author(s):

Tallulah S Andrews ◽

Martin Hemberg

Keyword(s):

Feature Selection ◽

Single Cell ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Selection Methods ◽

Functional Responses ◽

Technical Noise ◽

New Methods ◽

Selection For

Abstract Motivation Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise. Results We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets. Availability and implementation M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools: https://github.com/tallulandrews/M3Drop. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scRNABatchQC: multi-samples quality control for single cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz601 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5306-5308

Author(s):

Qi Liu ◽

Quanhu Sheng ◽

Jie Ping ◽

Marisol Adelina Ramirez ◽

Ken S Lau ◽

...

Keyword(s):

Single Cell ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Technical Artifact ◽

Multiple Sample ◽

Systematic Biases ◽

Cell Transcriptome ◽

Single Cell Transcriptome ◽

Spurious Results

Abstract Summary Single cell RNA sequencing is a revolutionary technique to characterize inter-cellular transcriptomics heterogeneity. However, the data are noise-prone because gene expression is often driven by both technical artifacts and genuine biological variations. Proper disentanglement of these two effects is critical to prevent spurious results. While several tools exist to detect and remove low-quality cells in one single cell RNA-seq dataset, there is lack of approach to examining consistency between sample sets and detecting systematic biases, batch effects and outliers. We present scRNABatchQC, an R package to compare multiple sample sets simultaneously over numerous technical and biological features, which gives valuable hints to distinguish technical artifact from biological variations. scRNABatchQC helps identify and systematically characterize sources of variability in single cell transcriptome data. The examination of consistency across datasets allows visual detection of biases and outliers. Availability and implementation scRNABatchQC is freely available at https://github.com/liuqivandy/scRNABatchQC as an R package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text