Reply to “A discriminative learning approach to differential expression analysis for single-cell RNA-seq”

Mapping Intimacies ◽

10.1101/648733 ◽

2019 ◽

Author(s):

Etienne Becht ◽

Edward Zhao ◽

Robert Amezquita ◽

Raphael Gottardo

Keyword(s):

Single Cell ◽

Differential Expression ◽

Differential Expression Analysis ◽

Discriminative Learning ◽

Learning Approach ◽

Rna Seq ◽

Multivariate Logistic Regression ◽

P Values ◽

Gene Differential Expression ◽

Better Than

AbstractMultivariate logistic regression (mLR) has been recently proposed by Ntranos et al. to perform gene differential expression analyses of single-cell RNA-sequencing (scRNAseq) data. Herein we reproduce and extend some of their findings. We notably show that while mLR performs better in simulated datasets, these simulations do not recapitulate important features of experimental datasets. Indeed, our results suggest that MAST followed by Sidak aggregation of the p-values perform better than mLR on experimental datasets. Overall, we highlight that most of the new results obtained by Ntranos et al is likely due to the quantification of scRNAseq data at the transcript or transcript compatibility classes level, rather than the use of mLR.

A discriminative learning approach to differential expression analysis for single-cell RNA-seq

Nature Methods ◽

10.1038/s41592-018-0303-9 ◽

2019 ◽

Vol 16 (2) ◽

pp. 163-166 ◽

Cited By ~ 26

Author(s):

Vasilis Ntranos ◽

Lynn Yi ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Discriminative Learning ◽

Learning Approach ◽

Rna Seq

Gene Set Analysis Using Spatial Statistics

Mathematics ◽

10.3390/math9050521 ◽

2021 ◽

Vol 9 (5) ◽

pp. 521

Author(s):

Angela L. Riffo-Campos ◽

Guillermo Ayala ◽

Francisco Montes

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Differential Expression Analysis ◽

Gene Set Analysis ◽

Point Pattern ◽

Rna Seq ◽

P Values ◽

Gene Set ◽

Gene Differential Expression ◽

Per Gene

Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using a gene set collection (gene set analysis). A previous (marginal) per-gene analysis of differential expression is usually performed in order to obtain a set of significant genes or marginal p-values used later in the study of association between phenotype and gene expression. This paper proposes the use of methods of spatial statistics for testing gene set differential expression analysis using paired samples of RNA-Seq counts. This approach is not based on a previous per-gene differential expression analysis. Instead, we compare the paired counts within each sample/control using a binomial test. Each pair per gene will produce a p-value so gene expression profile is transformed into a vector of p-values which will be considered as an event belonging to a point pattern. This would be the first component of a bivariate point pattern. The second component is generated by applying two different randomization distributions to the correspondence between samples and treatment. The self-contained null hypothesis considered in gene set analysis can be formulated in terms of the associated point pattern as a random labeling of the considered bivariate point pattern. The gene sets were defined by the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The proposed methodology was tested in four RNA-Seq datasets of colorectal cancer (CRC) patients and the results were contrasted with those obtained using the edgeR-GOseq pipeline. The proposed methodology has proved to be consistent at the biological and statistical level, in particular using Cuzick and Edwards test with one realization of the second component and between-pair distribution.

Two-phase differential expression analysis for single cell RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/bty329 ◽

2018 ◽

Vol 34 (19) ◽

pp. 3340-3348 ◽

Cited By ~ 11

Author(s):

Zhijin Wu ◽

Yi Zhang ◽

Michael L Stitzel ◽

Hao Wu

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Two Phase

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

Genes ◽

10.3390/genes12121947 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1947

Author(s):

Samarendra Das ◽

Anil Rai ◽

Michael L. Merchant ◽

Matthew C. Cave ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Individual Performance ◽

Rna Seq ◽

Gene Expressions ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

10.1101/143289 ◽

2017 ◽

Cited By ~ 16

Author(s):

Charlotte Soneson ◽

Mark D. Robinson

Keyword(s):

Single Cell ◽

Differential Expression ◽

Statistical Methods ◽

Expression Analysis ◽

Method Development ◽

Differential Expression Analysis ◽

Data Sets ◽

Rna Seq ◽

Data Set ◽

Extensive Evaluation

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.

scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment

Bioinformatics ◽

10.1093/bioinformatics/btaa097 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3115-3123 ◽

Cited By ~ 3

Author(s):

Teng Fei ◽

Tianwei Yu

Keyword(s):

Single Cell ◽

Differential Expression Analysis ◽

Distance Matrix ◽

Real Data ◽

R Package ◽

Batch Effect ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Differential Expression

Abstract Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.

SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data

Genomics ◽

10.1016/j.ygeno.2021.02.014 ◽

2021 ◽

Vol 113 (3) ◽

pp. 1308-1324

Author(s):

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Statistical Approach ◽

Differential Expression Analysis ◽

Rna Seq

Valid post-clustering differential analysis for single-cell RNA-Seq

10.1101/463265 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jesse M. Zhang ◽

Govinda M. Kamath ◽

David N. Tse

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Analysis Framework ◽

Link Type ◽

False Discoveries

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

10.1101/157982 ◽

2017 ◽

Cited By ~ 7

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Michael I. Love ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Negative Binomial Model ◽

Binomial Model ◽

Rna Seq ◽

Zero Inflation ◽

Zero Counts

AbstractDropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Sciviewer enables interactive visual interrogation of single-cell RNA-Seq data from the Python programming environment

Bioinformatics ◽

10.1093/bioinformatics/btab689 ◽

2021 ◽

Author(s):

Dylan Kotliar ◽

Andrés Colubri

Keyword(s):

Single Cell ◽

Differential Expression ◽

Differential Expression Analysis ◽

Visual Exploration ◽

Programming Environment ◽

Rna Seq ◽

Programming Environments ◽

Interactive Tools ◽

Novel Method ◽

Python Programming

Abstract Motivation Visualizing two-dimensional embeddings (such as UMAP or tSNE) is a useful step in interrogating single-cell RNA sequencing (scRNA-Seq) data. Subsequently, users typically iterate between programmatic analyses (including clustering and differential expression) and visual exploration (e.g. coloring cells by interesting features) to uncover biological signals in the data. Interactive tools exist to facilitate visual exploration of embeddings such as performing differential expression on user-selected cells. However, the practical utility of these tools is limited because they don’t support rapid movement of data and results to and from the programming environments where most of the data analysis takes place, interrupting the iterative process. Results Here, we present the Single-cell Interactive Viewer (Sciviewer), a tool that overcomes this limitation by allowing interactive visual interrogation of embeddings from within Python. Beyond differential expression analysis of user-selected cells, Sciviewer implements a novel method to identify genes varying locally along any user-specified direction on the embedding. Sciviewer enables rapid and flexible iteration between interactive and programmatic modes of scRNA-Seq exploration, illustrating a useful approach for analyzing high-dimensional data. Availability and implementation Code and examples are provided at https://github.com/colabobio/sciviewer.