scholarly journals Differential dropout analysis captures biological variation in single-cell RNA sequencing data

2021 ◽  
Author(s):  
Gerard A. Bouland ◽  
Ahmed Mahfouz ◽  
Marcel J.T. Reinders

AbstractSingle-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological rather than technical artifacts. We propose differential dropout analysis (DDA), as an alternative to differential expression analysis (DEA), to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available datasets, we show that dropout patterns are biological in nature and can assess the relative abundance of transcripts more robustly than counts.

2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Ahmed Mahfouz ◽  
Marcel J T Reinders

Abstract Single-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological variation rather than technical artifacts. We propose to use binarized expression profiles to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available and simulated datasets, we show that a binarized representation of single-cell expression data accurately represents biological variation and reveals the relative abundance of transcripts more robustly than counts.


2021 ◽  
Vol 2 (1) ◽  
pp. 43-61
Author(s):  
Aanchal Malhotra ◽  
Samarendra Das ◽  
Shesh N. Rai

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.


2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Wenan Chen ◽  
Yan Li ◽  
John Easton ◽  
David Finkelstein ◽  
Gang Wu ◽  
...  

Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1947
Author(s):  
Samarendra Das ◽  
Anil Rai ◽  
Michael L. Merchant ◽  
Matthew C. Cave ◽  
Shesh N. Rai

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.


2018 ◽  
Author(s):  
Wenhao Tang ◽  
François Bertaux ◽  
Philipp Thomas ◽  
Claire Stefanelli ◽  
Malika Saint ◽  
...  

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment vs control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly detect significant DE, with better performance than many parametric tools specifically developed for scRNA-seq data analysis. However, these rank tests cannot be used for complex experimental designs involving multiple groups, multiple factors and confounding variables. Further, rank based tests do not provide an interpretable measure of the effect size. We propose a semi-parametric approach based on probabilistic index models (PIM) that form a flexible class of models that generalize classical rank tests. Our method does not rely on strong distributional assumptions and it allows accounting for confounding factors. Moreover, it allows for the estimation of the effect size in terms of a probabilistic index. Real data analysis demonstrate that PIM is capable of identifying biologically meaningful DE. Our simulation studies also show that DE tests succeed well in controlling the false discovery rate at its nominal level, while maintaining good sensitivity as compared to competing methods.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S9) ◽  
Author(s):  
Siamak Zamani Dadaneh ◽  
Paul de Figueiredo ◽  
Sing-Hoi Sze ◽  
Mingyuan Zhou ◽  
Xiaoning Qian

Abstract Background Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data. Results In this paper, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of scRNA-seq data, obviating the need for explicitly modeling zero inflation. At the same time, hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization. Efficient Bayesian model inference is derived by exploiting conditional conjugacy via novel data augmentation techniques. Conclusion Experimental results on both simulated data and several real-world scRNA-seq datasets suggest that hGNB is a powerful tool for cell cluster discovery as well as cell lineage inference.


Sign in / Sign up

Export Citation Format

Share Document