A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

UMI-count modeling and differential expression analysis for single-cell RNA sequencing

Genome Biology ◽

10.1186/s13059-018-1438-9 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 31

Author(s):

Wenan Chen ◽

Yan Li ◽

John Easton ◽

David Finkelstein ◽

Gang Wu ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Cell Rna Sequencing

Download Full-text

A systematic evaluation of single cell RNA-seq analysis pipelines

Nature Communications ◽

10.1038/s41467-019-12266-7 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 47

Author(s):

Beate Vieth ◽

Swati Parekh ◽

Christoph Ziegenhain ◽

Wolfgang Enard ◽

Ines Hellmann

Keyword(s):

Best Practices ◽

Sample Size ◽

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Systematic Evaluation ◽

Library Preparation ◽

Rna Seq ◽

Rapid Spread ◽

Single Cell Rna Sequencing

Abstract The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.

Download Full-text

Differential dropout analysis captures biological variation in single-cell RNA sequencing data

10.1101/2021.02.01.429187 ◽

2021 ◽

Author(s):

Gerard A. Bouland ◽

Ahmed Mahfouz ◽

Marcel J.T. Reinders

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Relative Abundance ◽

Differential Expression Analysis ◽

Biological Variation ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Technical Artifacts ◽

Zero Counts

AbstractSingle-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological rather than technical artifacts. We propose differential dropout analysis (DDA), as an alternative to differential expression analysis (DEA), to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available datasets, we show that dropout patterns are biological in nature and can assess the relative abundance of transcripts more robustly than counts.

Download Full-text

BingleSeq: A user-friendly R package for Bulk and Single-cell RNA-Seq Data Analysis

10.1101/2020.06.16.148239 ◽

2020 ◽

Author(s):

Daniel Dimitrov ◽

Quan Gu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

R Package ◽

Rna Seq ◽

The Individual ◽

User Friendly

AbstractRNA sequencing is a high-throughput sequencing technique considered as an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is Differential Expression analysis and it is used to determine genetic loci with distinct expression across different conditions. On the other hand, an emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both these types of analyses include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that both require programming expertise.BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface and incorporating three state-of-the-art software packages for each type of the aforementioned analyses, alongside additional features such as key visualisation techniques, functional gene annotation analysis and rank-based consensus for differential gene analysis results, among others. As a result, BingleSeq puts the best and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programming experience.

Download Full-text

Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods

Frontiers in Genetics ◽

10.3389/fgene.2017.00062 ◽

2017 ◽

Vol 8 ◽

Cited By ~ 42

Author(s):

Alessandra Dal Molin ◽

Giacomo Baruzzo ◽

Barbara Di Camillo

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Analysis Methods ◽

Single Cell Rna Sequencing

Download Full-text

BingleSeq: a user-friendly R package for bulk and single-cell RNA-Seq data analysis

PeerJ ◽

10.7717/peerj.10469 ◽

2020 ◽

Vol 8 ◽

pp. e10469

Author(s):

Daniel Dimitrov ◽

Quan Gu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

R Package ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

The Individual ◽

Visualization Techniques ◽

User Friendly

Background RNA sequencing is an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is differential expression analysis and it is used to determine genetic loci with distinct expression across different conditions. An emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both of these approaches include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that they require programing expertise. Although some effort has been directed toward the development of user-friendly RNA-Seq analysis analysis tools, few have the flexibility to explore both Bulk and single-cell RNA sequencing. Implementation BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface which incorporates three state-of-the-art software packages for each type of the aforementioned analyses. Furthermore, BingleSeq includes additional features such as visualization techniques, extensive functional annotation analysis and rank-based consensus for differential gene analysis results. As a result, BingleSeq puts some of the best reviewed and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programing experience. Availability BingleSeq is as an easy-to-install R package available on GitHub at https://github.com/dbdimitrov/BingleSeq/.

Download Full-text

A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing

Computational and Structural Biotechnology Journal ◽

10.1016/j.csbj.2020.03.026 ◽

2020 ◽

Vol 18 ◽

pp. 861-873 ◽

Cited By ~ 2

Author(s):

Wenan Chen ◽

Silu Zhang ◽

Justin Williams ◽

Bensheng Ju ◽

Bridget Shaner ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Comparison Of Methods ◽

Batch Effects ◽

Single Cell Rna Sequencing

Download Full-text

Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide

BioMedInformatics ◽

10.3390/biomedinformatics2010003 ◽

2021 ◽

Vol 2 (1) ◽

pp. 43-61

Author(s):

Aanchal Malhotra ◽

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Count Data ◽

Negative Binomial ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Zero Inflation ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.

Download Full-text

A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines

10.1101/583013 ◽

2019 ◽

Cited By ~ 7

Author(s):

Beate Vieth ◽

Swati Parekh ◽

Christoph Ziegenhain ◽

Wolfgang Enard ◽

Ines Hellmann

Keyword(s):

Best Practices ◽

Sample Size ◽

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Systematic Evaluation ◽

Library Preparation ◽

Rna Seq ◽

Rapid Spread ◽

Single Cell Rna Sequencing

AbstractThe recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established, yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ∼ 3,000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.

Download Full-text