Differential dropout analysis captures biological variation in single-cell RNA sequencing data

AbstractSingle-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological rather than technical artifacts. We propose differential dropout analysis (DDA), as an alternative to differential expression analysis (DEA), to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available datasets, we show that dropout patterns are biological in nature and can assess the relative abundance of transcripts more robustly than counts.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

Differential analysis of binarized single-cell RNA sequencing data captures biological variation

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab118 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Gerard A Bouland ◽

Ahmed Mahfouz ◽

Marcel J T Reinders

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Biological Variation ◽

Expression Data ◽

Differential Analysis ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Cell Expression ◽

Zero Counts

Abstract Single-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological variation rather than technical artifacts. We propose to use binarized expression profiles to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available and simulated datasets, we show that a binarized representation of single-cell expression data accurately represents biological variation and reveals the relative abundance of transcripts more robustly than counts.

Download Full-text

Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide

BioMedInformatics ◽

10.3390/biomedinformatics2010003 ◽

2021 ◽

Vol 2 (1) ◽

pp. 43-61

Author(s):

Aanchal Malhotra ◽

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Count Data ◽

Negative Binomial ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Zero Inflation ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.

Download Full-text

UMI-count modeling and differential expression analysis for single-cell RNA sequencing

Genome Biology ◽

10.1186/s13059-018-1438-9 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 31

Author(s):

Wenan Chen ◽

Yan Li ◽

John Easton ◽

David Finkelstein ◽

Gang Wu ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Cell Rna Sequencing

Download Full-text

Pooling across cells to normalize single-cell RNA sequencing data with many zero counts

Genome Biology ◽

10.1186/s13059-016-0947-7 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 408

Author(s):

Aaron T. L. Lun ◽

Karsten Bach ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Zero Counts

Download Full-text

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

Genes ◽

10.3390/genes12121947 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1947

Author(s):

Samarendra Das ◽

Anil Rai ◽

Michael L. Merchant ◽

Matthew C. Cave ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Individual Performance ◽

Rna Seq ◽

Gene Expressions ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Download Full-text

bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data

10.1101/384586 ◽

2018 ◽

Cited By ~ 7

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Molecule ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.

Download Full-text

Probabilistic index models for testing differential expression in single cell RNA sequencing data

10.1101/718668 ◽

2019 ◽

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Effect Size ◽

Expression Patterns ◽

Simulation Studies ◽

Sequencing Data ◽

Rank Tests ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment vs control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly detect significant DE, with better performance than many parametric tools specifically developed for scRNA-seq data analysis. However, these rank tests cannot be used for complex experimental designs involving multiple groups, multiple factors and confounding variables. Further, rank based tests do not provide an interpretable measure of the effect size. We propose a semi-parametric approach based on probabilistic index models (PIM) that form a flexible class of models that generalize classical rank tests. Our method does not rely on strong distributional assumptions and it allows accounting for confounding factors. Moreover, it allows for the estimation of the effect size in terms of a probabilistic index. Real data analysis demonstrate that PIM is capable of identifying biologically meaningful DE. Our simulation studies also show that DE tests succeed well in controlling the false discovery rate at its nominal level, while maintaining good sensitivity as compared to competing methods.

Download Full-text

Bayesian gamma-negative binomial modeling of single-cell RNA sequencing data

BMC Genomics ◽

10.1186/s12864-020-06938-8 ◽

2020 ◽

Vol 21 (S9) ◽

Author(s):

Siamak Zamani Dadaneh ◽

Paul de Figueiredo ◽

Sing-Hoi Sze ◽

Mingyuan Zhou ◽

Xiaoning Qian

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Data Augmentation ◽

Negative Binomial ◽

Cell Lineage ◽

Simulated Data ◽

Molecular Heterogeneity ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Zero Counts

Abstract Background Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data. Results In this paper, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of scRNA-seq data, obviating the need for explicitly modeling zero inflation. At the same time, hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization. Efficient Bayesian model inference is derived by exploiting conditional conjugacy via novel data augmentation techniques. Conclusion Experimental results on both simulated data and several real-world scRNA-seq datasets suggest that hGNB is a powerful tool for cell cluster discovery as well as cell lineage inference.

Download Full-text