Individual Level Differential Expression Analysis for Single Cell RNA-seq data

Bulk RNA-seq data quantify the expression of a gene in an individual by one number (e.g., fragment count). In contrast, single cell RNA-seq (scRNA-seq) data provide much richer information: the distribution of gene expression across many cells. To assess differential expression across individuals using scRNA-seq data, a straightforward solution is to create ''pseudo'' bulk RNA-seq data by adding up the fragment counts of a gene across cells for each individual, and then apply methods designed for differential expression using bulk RNA-seq data. This pseudo-bulk solution reduces the distribution of gene expression across cells to a single number and thus loses a good amount of information. We propose to assess differential expression using the gene expression distribution measured by cell level data. We find denoising cell level data can substantially improve the power of this approach. We apply our method, named IDEAS (Individual level Differential Expression Analysis for scRNA-seq), to study the gene expression difference between autism subjects and controls. We find neurogranin-expressing neurons harbor a high proportion of differentially expressed genes, and ERBB signals in microglia are associated with autism.

Download Full-text

Two-phase differential expression analysis for single cell RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/bty329 ◽

2018 ◽

Vol 34 (19) ◽

pp. 3340-3348 ◽

Cited By ~ 11

Author(s):

Zhijin Wu ◽

Yi Zhang ◽

Michael L Stitzel ◽

Hao Wu

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Two Phase

Download Full-text

Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

10.1101/143289 ◽

2017 ◽

Cited By ~ 16

Author(s):

Charlotte Soneson ◽

Mark D. Robinson

Keyword(s):

Single Cell ◽

Differential Expression ◽

Statistical Methods ◽

Expression Analysis ◽

Method Development ◽

Differential Expression Analysis ◽

Data Sets ◽

Rna Seq ◽

Data Set ◽

Extensive Evaluation

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.

Download Full-text

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA–sequencing data

10.1101/220129 ◽

2017 ◽

Cited By ~ 2

Author(s):

Alemu Takele Assefa ◽

Katrijn De Paepe ◽

Celine Everaert ◽

Pieter Mestdagh ◽

Olivier Thas ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

Web Application ◽

Empirical Bayes ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Normalization Methods

ABSTRACTBackgroundProtein-coding RNAs (mRNA) have been the primary target of most transcriptome studies in the past, but in recent years, attention has expanded to include long non-coding RNAs (lncRNA). lncRNAs are typically expressed at low levels, and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 14 popular tools for testing DE in RNA-seq data along with their normalization methods is comprehensively evaluated, with a particular focus on lncRNAs and low abundant mRNAs.ResultsThirteen performance metrics were used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Non-parametric procedures are used to simulate gene expression data in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, we kept track of the results for mRNA and lncRNA separately. All statistical models exhibited inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and analysis of benchmark RNA-seq datasets. No single tool uniformly outperformed the others.ConclusionOverall, the linear modeling with empirical Bayes moderation (limma) and the nonparametric approach (SAMSeq) showed best performance: good control of the false discovery rate (FDR) and reasonable sensitivity. However, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in a realistic clinical settings such as in cancer research. About half of the methods showed severe excess of false discoveries, making these methods unreliable for differential expression analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, http://statapps.ugent.be/tools/AppDGE/

Download Full-text

Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment

10.1101/2021.04.23.441097 ◽

2021 ◽

Author(s):

Anish M.S. Shrestha ◽

Joyce Emlyn B. Guiao ◽

Kyle Christian R. Santiago

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

Differential Expression Analysis ◽

Homology Search ◽

Model Organisms ◽

Rna Seq ◽

Protein Database

AbstractRNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. Conventional differential expression analysis for organisms without reference sequences requires performing computationally expensive and error-prone de-novo transcriptome assembly, followed by homology search against a high-confidence protein database for functional annotation. We propose a shortcut, where we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the protein database. Through experiments on simulated and real data, we show drastic reductions in run-time and memory usage, with no loss in accuracy. A Snakemake implementation of our workflow is available at:https://bitbucket.org/project_samar/samar

Download Full-text

A discriminative learning approach to differential expression analysis for single-cell RNA-seq

Nature Methods ◽

10.1038/s41592-018-0303-9 ◽

2019 ◽

Vol 16 (2) ◽

pp. 163-166 ◽

Cited By ~ 26

Author(s):

Vasilis Ntranos ◽

Lynn Yi ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Discriminative Learning ◽

Learning Approach ◽

Rna Seq

Download Full-text

SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data

Genomics ◽

10.1016/j.ygeno.2021.02.014 ◽

2021 ◽

Vol 113 (3) ◽

pp. 1308-1324

Author(s):

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Statistical Approach ◽

Differential Expression Analysis ◽

Rna Seq

Download Full-text

Valid post-clustering differential analysis for single-cell RNA-Seq

10.1101/463265 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jesse M. Zhang ◽

Govinda M. Kamath ◽

David N. Tse

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Analysis Framework ◽

Link Type ◽

False Discoveries

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.

Download Full-text

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

10.1101/157982 ◽

2017 ◽

Cited By ~ 7

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Michael I. Love ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Negative Binomial Model ◽

Binomial Model ◽

Rna Seq ◽

Zero Inflation ◽

Zero Counts

AbstractDropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Download Full-text

Identification of differentially distributed gene expression and distinct sets of cancer-related genes identified by changes in expression mean and variability

10.1101/2021.02.15.431343 ◽

2021 ◽

Author(s):

Aedan G. K. Roberts ◽

Daniel R. Catchpoole ◽

Paul J. Kennedy

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

The Cancer Genome Atlas ◽

Distribution Analysis ◽

Cancer Genes ◽

Rna Seq ◽

Differential Distribution ◽

Expression Variability

AbstractBackgroundDifferential expression analysis of RNA-seq data has advanced rapidly since the introduction of the technology, and methods such as edgeR and DESeq2 have become standard parts of analysis pipelines. However, there is a growing body of research showing that differences in variability of gene expression or overall differences in the distribution of expression values – differential distribution – are also important both in normal biology and in diseases including cancer. Genes whose expression differs in distribution without a difference in mean expression level are ignored by differential expression methods.ResultsWe have developed a Bayesian hierarchical model which improves on existing methods for identifying differential dispersion in RNA-seq data, and provides an overall test for differential distribution. We have applied these methods to investigate differential dispersion and distribution in cancer using RNA-seq datasets from The Cancer Genome Atlas. Our results show that differential dispersion and distribution are able to identify cancer-related genes. Further, we find that differential dispersion identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential dispersion identify functionally distinct sets of genes.ConclusionThis work highlights the importance of considering changes beyond differences in mean in the analysis of gene expression data, and suggests that analysis of expression variability may provide insights into genetic aspects of cancer that would not be revealed by differential expression analysis alone. For identification of cancer-related genes, differential distribution analysis allows the identification of genes whose expression is disrupted in terms of either mean or variability.

Download Full-text

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009118 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009118

Author(s):

Jing Qi ◽

Yang Zhou ◽

Zicen Zhao ◽

Shuilin Jin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Differential Expression Analysis ◽

Cell Types ◽

Rna Seq ◽

Cell Level ◽

Gene Level ◽

Level Information ◽

Downstream Analysis ◽

Gene Expression Levels

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.

Download Full-text