scTSSR: gene expression recovery for single-cell RNA sequencing using two-side sparse self-representation

Ke Jin; Le Ou-Yang; Xing-Ming Zhao; Hong Yan; Xiao-Fei Zhang

doi:10.1093/bioinformatics/btaa108

scTSSR: gene expression recovery for single-cell RNA sequencing using two-side sparse self-representation

Bioinformatics ◽

10.1093/bioinformatics/btaa108 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3131-3138

Author(s):

Ke Jin ◽

Le Ou-Yang ◽

Xing-Ming Zhao ◽

Hong Yan ◽

Xiao-Fei Zhang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Differential Expression Analysis ◽

Supplementary Information ◽

Expression Levels ◽

Single Cell Rna Sequencing ◽

Downstream Analysis ◽

Gene Expression Levels

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) methods make it possible to reveal gene expression patterns at single-cell resolution. Due to technical defects, dropout events in scRNA-seq will add noise to the gene-cell expression matrix and hinder downstream analysis. Therefore, it is important for recovering the true gene expression levels before carrying out downstream analysis. Results In this article, we develop an imputation method, called scTSSR, to recover gene expression for scRNA-seq. Unlike most existing methods that impute dropout events by borrowing information across only genes or cells, scTSSR simultaneously leverages information from both similar genes and similar cells using a two-side sparse self-representation model. We demonstrate that scTSSR can effectively capture the Gini coefficients of genes and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization (smRNA FISH). Down-sampling experiments indicate that scTSSR performs better than existing methods in recovering the true gene expression levels. We also show that scTSSR has a competitive performance in differential expression analysis, cell clustering and cell trajectory inference. Availability and implementation The R package is available at https://github.com/Zhangxf-ccnu/scTSSR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing

Bioinformatics ◽

10.1093/bioinformatics/btaa278 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4021-4029

Author(s):

Hyundoo Jeong ◽

Zhandong Liu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Expression Patterns ◽

Imputation Method ◽

Supplementary Information ◽

Single Cell Sequencing ◽

Depth Analysis ◽

Single Cell Rna Sequencing

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones

Bioinformatics ◽

10.1093/bioinformatics/bty316 ◽

2018 ◽

Vol 34 (18) ◽

pp. 3217-3219 ◽

Cited By ~ 21

Author(s):

Sören Müller ◽

Ara Cho ◽

Siyuan J Liu ◽

Daniel A Lim ◽

Aaron Diaz

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Analysis ◽

Copy Number ◽

Differential Expression Analysis ◽

Software Tool ◽

Supplementary Information ◽

Single Cell Rna Sequencing ◽

Robust Separation

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has enabled studies of tissue composition at unprecedented resolution. However, the application of scRNA-seq to clinical cancer samples has been limited, partly due to a lack of scRNA-seq algorithms that integrate genomic mutation data. Results To address this, we present CONICS COpy-Number analysis In single-Cell RNA-Sequencing. CONICS is a software tool for mapping gene expression from scRNA-seq to tumor clones and phylogenies, with routines enabling: the quantitation of copy-number alterations in scRNA-seq, robust separation of neoplastic cells from tumor-infiltrating stroma, inter-clone differential-expression analysis and intra-clone co-expression analysis. Availability and implementation CONICS is written in Python and R, and is available from https://github.com/diazlab/CONICS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btz726 ◽

2019 ◽

Cited By ~ 2

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Batch Effect ◽

Supplementary Information ◽

Single Cell Rna Sequencing

Abstract Motivation Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction. Results Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data. Availability and implementation The R package ‘bayNorm’ is publishd on bioconductor at https://bioconductor.org/packages/release/bioc/html/bayNorm.html. The code for analyzing data in this article is available at https://github.com/WT215/bayNorm_papercode. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cryopreservation of microglia enables single-cell RNA sequencing with minimal effects on disease-related gene expression patterns

iScience ◽

10.1016/j.isci.2021.102357 ◽

2021 ◽

Vol 24 (4) ◽

pp. 102357

Author(s):

Brenda Morsey ◽

Meng Niu ◽

Shetty Ravi Dyavar ◽

Courtney V. Fletcher ◽

Benjamin G. Lamberty ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Gene Expression Patterns ◽

Related Gene ◽

Single Cell Rna Sequencing ◽

Disease Related Gene

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text

EnImpute: imputing dropout events in single-cell RNA-sequencing data via ensemble learning

Bioinformatics ◽

10.1093/bioinformatics/btz435 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4827-4829 ◽

Cited By ~ 6

Author(s):

Xiao-Fei Zhang ◽

Le Ou-Yang ◽

Shuo Yang ◽

Xing-Ming Zhao ◽

Xiaohua Hu ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Ensemble Learning ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

The Individual ◽

Downstream Analysis ◽

Shiny Application

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TSEE: an elastic embedding method to visualize the dynamic gene expression patterns of time series single-cell RNA sequencing data

BMC Genomics ◽

10.1186/s12864-019-5477-8 ◽

2019 ◽

Vol 20 (S2) ◽

Cited By ~ 5

Author(s):

Shaokun An ◽

Liang Ma ◽

Lin Wan

Keyword(s):

Gene Expression ◽

Time Series ◽

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Gene Expression Patterns ◽

Sequencing Data ◽

Embedding Method ◽

Single Cell Rna Sequencing

Download Full-text

bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data

10.1101/384586 ◽

2018 ◽

Cited By ~ 7

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Molecule ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.

Download Full-text

SAVER: Gene expression recovery for UMI-based single cell RNA sequencing

10.1101/138677 ◽

2017 ◽

Cited By ~ 15

Author(s):

Mo Huang ◽

Jingshu Wang ◽

Eduardo Torre ◽

Hannah Dueck ◽

Sydney Shaffer ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Cell Analysis ◽

Specific Gene ◽

Recovery Method ◽

Single Cell Rna Sequencing ◽

Cell Gene Expression ◽

Single Cell Profiling ◽

Downstream Analysis

AbstractRapid advances in massively parallel single cell RNA sequencing (scRNA-seq) is paving the way for high-resolution single cell profiling of biological samples. In most scRNA-seq studies, only a small fraction of the transcripts present in each cell are sequenced. The efficiency, that is, the proportion of transcripts in the cell that are sequenced, can be especially low in highly parallelized experiments where the number of reads allocated for each cell is small. This leads to unreliable quantification of lowly and moderately expressed genes, resulting in extremely sparse data and hindering downstream analysis. To address this challenge, we introduce SAVER (Single-cell Analysis Via Expression Recovery), an expression recovery method for scRNA-seq that borrows information across genes and cells to impute the zeros as well as to improve the expression estimates for all genes. We show, by comparison to RNA fluorescence in situ hybridization (FISH) and by data down-sampling experiments, that SAVER reliably recovers cell-specific gene expression concentrations, cross-cell gene expression distributions, and gene-to-gene and cell-to-cell correlations. This improves the power and accuracy of any downstream analysis involving genes with low to moderate expression.

Download Full-text

Stably expressed genes in single-cell RNA-sequencing

10.1101/475426 ◽

2018 ◽

Cited By ~ 3

Author(s):

Julie M. Deeke ◽

Johann A. Gagnon-Bartsch

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Stable Expression ◽

Supplementary Information ◽

Isolated Cells ◽

Gene Sets ◽

Single Cell Rna Sequencing ◽

Endogenous Genes ◽

Technical Artifacts

AbstractMotivationIn single-cell RNA-sequencing (scRNA-seq) experiments, RNA transcripts are extracted and measured from isolated cells to understand gene expression at the cellular level. Measurements from this technology are affected by many technical artifacts, including batch effects. In analogous bulk gene expression experiments, external references, e.g., synthetic gene spike-ins often from the External RNA Controls Consortium (ERCC), may be incorporated to the experimental protocol for use in adjusting measurements for technical artifacts. In scRNA-seq experiments, the use of external spike-ins is controversial due to dissimilarities with endogenous genes and uncertainty about sufficient precision of their introduction. Instead, endogenous genes with highly stable expression could be used as references within scRNA-seq to help normalize the data. First, however, a specific notion of stable expression at the single cell level needs to be formulated; genes could be stable in absolute expression, in proportion to cell volume, or in proportion to total gene expression. Different types of stable genes will be useful for different normalizations and will need different methods for discovery.ResultsWe compile gene sets whose products are associated with cellular structures and record these gene sets for future reuse and analysis. We find that genes whose final product are associated with the cytosolic ribosome have expressions that are highly stable with respect to the total RNA content. Notably, these genes appear to be stable in bulk measurements as well.Supplementary informationThe Supplement is available on bioRxiv, and the gene set database is available through [email protected]

Download Full-text