Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing

AbstractWith the rapid advancement of single-cell RNA-seq (scRNA-seq) technology, many data preprocessing methods have been proposed to address numerous systematic errors and technical variabilities inherent in this technology. While these methods have been demonstrated to be effective in recovering individual gene expression, the suitability to the inference of gene-gene associations and subsequent gene networks reconstruction have not been systemically investigated. In this study, we benchmarked five representative scRNA-seq normalization/imputation methods on human cell atlas bone marrow data with respect to their impact on inferred gene-gene associations. Our results suggested that a considerable amount of spurious correlations was introduced during the data preprocessing steps due to over-smoothing of the raw data. We proposed a model-agnostic noise regularization method that can effectively eliminate the correlation artifacts. The noise regularized gene-gene correlations were further used to reconstruct gene co-expression network and successfully revealed several known immune cell modules.

Download Full-text

Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing

Patterns ◽

10.1016/j.patter.2021.100211 ◽

2021 ◽

pp. 100211

Author(s):

Ruoyu Zhang ◽

Gurinder S. Atwal ◽

Wei Keat Lim

Keyword(s):

Single Cell ◽

Data Preprocessing ◽

Rna Seq ◽

Noise Regularization

Download Full-text

Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling

BMC Genomics ◽

10.1186/s12864-020-07358-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tracy M. Yamawaki ◽

Daniel R. Lu ◽

Daniel C. Ellwanger ◽

Dev Bhatt ◽

Paolo Manzanillo ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Immune Cell ◽

Cell Types ◽

Data Interpretation ◽

Detection Sensitivity ◽

Rna Seq ◽

Cell Recovery

Abstract Background Elucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation. Results Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluated methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5′ v1 and 3′ v3 methods. We demonstrate that these methods have fewer dropout events, which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures. Conclusion Overall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo.

Download Full-text

Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data

BMC Bioinformatics ◽

10.1186/s12859-020-03797-8 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Chunxiang Wang ◽

Xin Gao ◽

Juntao Liu

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Preprocessing ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Preprocessing Method ◽

Cell Clustering ◽

Cell Gene Expression

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.

Download Full-text

Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer

Nature Communications ◽

10.1038/ncomms15081 ◽

2017 ◽

Vol 8 (1) ◽

Cited By ~ 250

Author(s):

Woosung Chung ◽

Hye Hyeon Eum ◽

Hae-Ock Lee ◽

Kyung-Min Lee ◽

Han-Byoel Lee ◽

...

Keyword(s):

Breast Cancer ◽

Single Cell ◽

Primary Breast Cancer ◽

Immune Cell ◽

Rna Seq

Download Full-text

scDoc: correcting drop-out events in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa283 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4233-4239

Author(s):

Di Ran ◽

Shanshan Zhang ◽

Nicholas Lytal ◽

Lingling An

Keyword(s):

Single Cell ◽

Simulated Data ◽

Drop Out ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Cell Subpopulation ◽

Rna Seq ◽

Imputation Methods ◽

Similarity Estimation ◽

Differential Expression Detection

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of ‘drop-out’ events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this article, we present a novel single-cell RNA-seq drop-out correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. Results scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification and differential expression detection in scRNA-seq data. Availability and implementation R code is available at https://github.com/anlingUA/scDoc. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CHARTS: A web application for characterizing and comparing tumor subpopulations in publicly available single-cell RNA-seq datasets

10.1101/2020.09.23.310441 ◽

2020 ◽

Author(s):

Matthew N. Bernstein ◽

Zijian Ni ◽

Michael Collins ◽

Mark E. Burkard ◽

Christina Kendziorski ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Web Application ◽

Cellular Heterogeneity ◽

Rna Seq ◽

Individual Gene ◽

Genome Wide ◽

Progression Of Disease ◽

Cell Subpopulations ◽

Available Information

AbstractBackgroundSingle-cell RNA-seq (scRNA-seq) enables the profiling of genome-wide gene expression at the single-cell level and in so doing facilitates insight into and information about cellular heterogeneity within a tissue. Perhaps nowhere is this more important than in cancer, where tumor and tumor microenvironment heterogeneity directly impact development, maintenance, and progression of disease. While publicly available scRNA-seq cancer datasets offer unprecedented opportunity to better understand the mechanisms underlying tumor progression, metastasis, drug resistance, and immune evasion, much of the available information has been underutilized, in part, due to the lack of tools available for aggregating and analysing these data.ResultsWe present CHARacterizing Tumor Subpopulations (CHARTS), a computational pipeline and web application for analyzing, characterizing, and integrating publicly available scRNA-seq cancer datasets. CHARTS enables the exploration of individual gene expression, cell type, malignancy-status, differentially expressed genes, and gene set enrichment results in subpopulations of cells across multiple tumors and datasets.ConclusionCHARTS is an easy to use, comprehensive platform for exploring single-cell subpopulations within tumors across the ever-growing collection of public scRNA-seq cancer datasets. CHARTS is freely available at charts.morgridge.org.

Download Full-text

Full-Length Transcriptome: A Reliable Alternative for Single-Cell RNA-Seq Analysis in the Spleen of Teleost Without Reference Genome

Frontiers in Immunology ◽

10.3389/fimmu.2021.737332 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lixing Huang ◽

Ying Qiao ◽

Wei Xu ◽

Linfeng Gong ◽

Rongchao He ◽

...

Keyword(s):

Data Analysis ◽

B Cells ◽

Single Cell ◽

Reference Genome ◽

Immune Cell ◽

Cell Types ◽

Full Length ◽

Cell Populations ◽

Epinephelus Coioides ◽

Rna Seq

Fish is considered as a supreme model for clarifying the evolution and regulatory mechanism of vertebrate immunity. However, the knowledge of distinct immune cell populations in fish is still limited, and further development of techniques advancing the identification of fish immune cell populations and their functions are required. Single cell RNA-seq (scRNA-seq) has provided a new approach for effective in-depth identification and characterization of cell subpopulations. Current approaches for scRNA-seq data analysis usually rely on comparison with a reference genome and hence are not suited for samples without any reference genome, which is currently very common in fish research. Here, we present an alternative, i.e. scRNA-seq data analysis with a full-length transcriptome as a reference, and evaluate this approach on samples from Epinephelus coioides-a teleost without any published genome. We show that it reconstructs well most of the present transcripts in the scRNA-seq data achieving a sensitivity equivalent to approaches relying on genome alignments of related species. Based on cell heterogeneity and known markers, we characterized four cell types: T cells, B cells, monocytes/macrophages (Mo/MΦ) and NCC (non-specific cytotoxic cells). Further analysis indicated the presence of two subsets of Mo/MΦ including M1 and M2 type, as well as four subsets in B cells, i.e. mature B cells, immature B cells, pre B cells and early-pre B cells. Our research will provide new clues for understanding biological characteristics, development and function of immune cell populations of teleost. Furthermore, our approach provides a reliable alternative for scRNA-seq data analysis in teleost for which no reference genome is currently available.

Download Full-text

Locality Sensitive Imputation for Single-Cell RNA-Seq Data

10.1101/291807 ◽

2018 ◽

Cited By ~ 3

Author(s):

Marmar Moussa ◽

Ion I. Măndoiu

Keyword(s):

Data Analysis ◽

Single Cell ◽

Comprehensive Assessment ◽

Sequencing Depth ◽

Drop Out ◽

Rna Seq ◽

Imputation Methods ◽

Drop Outs ◽

Random Nature ◽

Imputation Approach

AbstractOne of the most notable challenges in single cell RNA-Seq data analysis is the so called drop-out effect, where only a fraction of the transcriptome of each cell is captured. The random nature of drop-outs, however, makes it possible to consider imputation methods as means of correcting for drop-outs. In this paper we study some existing scRNA-Seq imputation methods and propose a novel iterative imputation approach based on efficiently computing highly similar cells. We then present the results of a comprehensive assessment of existing and proposed methods on real scRNA-Seq datasets with varying per cell sequencing depth.

Download Full-text

Systematic Comparison of High-Throughput Single-Cell RNAseq Methods for Immune Cell Profiling

10.21203/rs.3.rs-72851/v1 ◽

2020 ◽

Author(s):

Chi-Ming Kevin Li ◽

Tracy M Yamawaki ◽

Daniel R Lu ◽

Daniel C Ellwanger ◽

Dev Bhatt ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Immune Cell ◽

Cell Types ◽

Drop Out ◽

Detection Sensitivity ◽

Rna Seq ◽

Cell Recovery

Abstract Background: Elucidation of immune populations with single-cell RNA-seq has greatly benefited the fieldof immunology by deepening the characterization of immune heterogeneity and leading to thediscovery of new subtypes. However, single-cell methods inherently suffer from limitations in therecovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropoutevents. This issue is often compounded by limited sample availability and limited prior knowledge ofheterogeneity, which can confound data interpretation.Results: Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. Weprepared 21 libraries under identical conditions of a defined mixture of two human and two murinelymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluatemethods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expressionsignatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5’v1 and 3’ v3 methods. We demonstrate that these methods have fewer drop-out events whichfacilitates the identification of differentially-expressed genes and improves the concordance of singlecellprofiles to immune bulk RNA-seq signatures.Conclusion: Overall, our characterization of immune cell mixtures provides useful metrics, which canguide selection of a high-throughput single-cell RNA-seq method for profiling more complex immunecellheterogeneity usually found in vivo.

Download Full-text

Deciphering the evolution of vertebrate immune cell types with single-cell RNA-seq

10.7287/peerj.preprints.26858 ◽

2018 ◽

Author(s):

Santiago J Carmona ◽

David Gfeller

Keyword(s):

Single Cell ◽

Immune Cell ◽

Adaptive Immune System ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Recent Developments ◽

Species Specific ◽

Development And Validation ◽

Whole Transcriptome

Single-cell RNA-seq is revolutionizing our understanding of cell type heterogeneity in many fields of biology, ranging from neuroscience to cancer to immunology. In Immunology, one of the main promises of this approach is the ability to define cell types as clusters in the whole transcriptome space (i.e., without relying on specific surface markers), thereby providing an unbiased classification of immune cell types. So far, this technology has been mainly applied in mouse and human. However, technically it could be used for immune cell-type identification in any species without requiring the development and validation of species-specific antibodies for cell sorting. Here we review recent developments using single-cell RNA-seq to characterize immune cell populations in non-mammalian vertebrates, with a focus on zebrafish (Danio rerio). We advocate that single-cell RNA-seq technology is likely to provide key insights into our understanding of the evolution of the adaptive immune system.

Download Full-text