scholarly journals PBLR: an accurate single cell RNA-seq data imputation tool considering cell heterogeneity and prior expression level of dropouts

2018 ◽  
Author(s):  
Lihua Zhang ◽  
Shihua Zhang

AbstractSingle-cell RNA sequencing (scRNA-seq) provides a powerful tool to determine precise expression patterns of tens of thousands of individual cells, decipher cell heterogeneity and cell subpopulations and so on. However, scRNA-seq data analysis remains challenging due to various technical noise, e.g., the presence of dropout events (i.e., excess zero counts). Taking account of cell heterogeneity and structural effect of expression on dropout rate, we propose a novel method named PBLR to accurately impute the dropouts of scRNA-seq data. PBLR is an effective tool to recover dropout events on both simulated and real scRNA-seq datasets, and can dramatically improve low-dimensional representation and recovery of gene-gene relationship masked by dropout events compared to several state-of-the-art methods. Moreover, PBLR also detect accurate and robust cell subpopulations automatically, shedding light its flexibility and generality for scRNA-seq data analysis.

Author(s):  
Lihua Zhang ◽  
Shihua Zhang

Abstract Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to determine expression patterns of thousands of individual cells. However, the analysis of scRNA-seq data remains a computational challenge due to the high technical noise such as the presence of dropout events that lead to a large proportion of zeros for expressed genes. Taking into account the cell heterogeneity and the relationship between dropout rate and expected expression level, we present a cell sub-population based bounded low-rank (PBLR) method to impute the dropouts of scRNA-seq data. Through application to both simulated and real scRNA-seq datasets, PBLR is shown to be effective in recovering dropout events, and it can dramatically improve the low-dimensional representation and the recovery of gene‒gene relationships masked by dropout events compared to several state-of-the-art methods. Moreover, PBLR also detects accurate and robust cell sub-populations automatically, shedding light on its flexibility and generality for scRNA-seq data analysis.


2019 ◽  
Author(s):  
Yiliang Zhang ◽  
Kexuan Liang ◽  
Molei Liu ◽  
Yue Li ◽  
Hao Ge ◽  
...  

AbstractSingle-cell RNA sequencing technologies are widely used in recent years as a powerful tool allowing the observation of gene expression at the resolution of single cells. Two of the major challenges in scRNA-seq data analysis are dropout events and batch effects. The inflation of zero(dropout rate) varies substantially across single cells. Evidence has shown that technical noise, including batch effects, explains a notable proportion of this cell-to-cell variation. To capture biological variation, it is necessary to quantify and remove technical variation. Here, we introduce SCRIBE (Single-Cell Recovery Imputation with Batch Effects), a principled framework that imputes dropout events and corrects batch effects simultaneously. We demonstrate, through real examples, that SCRIBE outperforms existing scRNA-seq data analysis tools in recovering cell-specific gene expression patterns, removing batch effects and retaining biological variation across cells. Our software is freely available online at https://github.com/YiliangTracyZhang/SCRIBE.


Blood ◽  
2017 ◽  
Vol 129 (17) ◽  
pp. 2384-2394 ◽  
Author(s):  
Rebecca Warfvinge ◽  
Linda Geironson ◽  
Mikael N. E. Sommarin ◽  
Stefan Lang ◽  
Christine Karlsson ◽  
...  

Key Points Single-cell gene expression analysis reveals CML stem cell heterogeneity and changes imposed by TKI therapy. A subpopulation with primitive, quiescent signature and increased survival to therapy can be high-purity captured as CD45RA−cKIT−CD26+.


Cancers ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 144
Author(s):  
Patrícia Neuperger ◽  
József Á. Balog ◽  
László Tiszlavicz ◽  
József Furák ◽  
Nikolett Gémes ◽  
...  

Intratumoral heterogeneity (ITH) is responsible for the majority of difficulties encountered in the treatment of lung-cancer patients. Therefore, the heterogeneity of NSCLC cell lines and primary lung adenocarcinoma was investigated by single-cell mass cytometry (CyTOF). First, we studied the single-cell heterogeneity of frequent NSCLC adenocarcinoma models, such as A549, H1975, and H1650. The intra- and inter-cell-line single-cell heterogeneity is represented in the expression patterns of 13 markers—namely GLUT1, MCT4, CA9, TMEM45A, CD66, CD274 (PD-L1), CD24, CD326 (EpCAM), pan-keratin, TRA-1-60, galectin-3, galectin-1, and EGFR. The qRT-PCR and CyTOF analyses revealed that a hypoxic microenvironment and altered metabolism may influence cell-line heterogeneity. Additionally, human primary lung adenocarcinoma and non-involved healthy lung tissue biopsies were homogenized to prepare a single-cell suspension for CyTOF analysis. The CyTOF showed the ITH of human primary lung adenocarcinoma for 14 markers; particularly, the higher expressions of GLUT1, MCT4, CA9, TMEM45A, and CD66 were associated with the lung-tumor tissue. Our single-cell results are the first to demonstrate TMEM45A expression in human lung adenocarcinoma, which was verified by immunohistochemistry.


2016 ◽  
Author(s):  
Vijay Ramani ◽  
Xinxian Deng ◽  
Kevin L Gunderson ◽  
Frank J Steemers ◽  
Christine M Disteche ◽  
...  

AbstractWe present combinatorial single cell Hi-C, a novel method that leverages combinatorial cellular indexing to measure chromosome conformation in large numbers of single cells. In this proof-of-concept, we generate and sequence combinatorial single cell Hi-C libraries for two mouse and four human cell types, comprising a total of 9,316 single cells across 5 experiments. We demonstrate the utility of single-cell Hi-C data in separating different cell types, identify previously uncharacterized cell-to-cell heterogeneity in the conformational properties of mammalian chromosomes, and demonstrate that combinatorial indexing is a generalizable molecular strategy for single-cell genomics.


2021 ◽  
Author(s):  
Jiayi Dong ◽  
Yin Zhang ◽  
Fei Wang

Abstract Background: With the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing(scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach. Results: In this paper, we introduce a semi-supervised dimensionality reduction method named scSemiAE, which is based on an autoencoder model. It transfers the information contained in available datasets with cell subpopulation labels to guide the search of better low-dimensional representations, which can ease further analysis. Conclusions: Experiments on five public datasets show that, scSemiAE outperforms both unsupervised and semi-supervised baselines whether the transferred information embodied in the number of labeled cells and labeled cell subpopulations is much or less.


2021 ◽  
Author(s):  
Marmar Moussa ◽  
Ion Mandoiu

Single cell RNA-Seq (scRNA-Seq) is critical for studying cellular function and phenotypic heterogeneity as well as the development of tissues and tumors. Here, we present SC1 a web-based highly interactive scRNA-Seq data analysis tool publicly accessible at https://sc1.engr.uconn.edu. The tool presents an integrated workflow for scRNA-Seq analysis, implements a novel method of selecting informative genes based on Term-Frequency Inverse-Document-Frequency (TF-IDF) scores, and provides a broad range of methods for clustering, differential expression analysis, gene enrichment, interactive visualization, and cell cycle analysis. The tool integrates other single cell omics data modalities like TCR-Seq and supports several single cell sequencing technologies. In just a few steps, researchers can generate a comprehensive analysis and gain powerful insights from their scRNA-Seq data.


2020 ◽  
Author(s):  
Tamim Abdelaal ◽  
Jeroen Eggermont ◽  
Thomas Höllt ◽  
Ahmed Mahfouz ◽  
Marcel J.T. Reinders ◽  
...  

SummaryThe ever-increasing number of analyzed cells in Single-cell RNA sequencing (scRNA-seq) experiments imposes several challenges on the data analysis. Current analysis methods lack scalability to large datasets hampering interactive visual exploration of the data. We present Cytosplore-Transcriptomics, a framework to analyze scRNA-seq data, including data preprocessing, visualization and downstream analysis. At its core, it uses a hierarchical, manifold preserving representation of the data that allows the inspection and annotation of scRNA-seq data at different levels of detail. Consequently, Cytosplore-Transcriptomics provides interactive analysis of the data using low-dimensional visualizations that scales to millions of cells.AvailabilityCytosplore-Transcriptomics can be freely downloaded from [email protected]


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment vs control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly detect significant DE, with better performance than many parametric tools specifically developed for scRNA-seq data analysis. However, these rank tests cannot be used for complex experimental designs involving multiple groups, multiple factors and confounding variables. Further, rank based tests do not provide an interpretable measure of the effect size. We propose a semi-parametric approach based on probabilistic index models (PIM) that form a flexible class of models that generalize classical rank tests. Our method does not rely on strong distributional assumptions and it allows accounting for confounding factors. Moreover, it allows for the estimation of the effect size in terms of a probabilistic index. Real data analysis demonstrate that PIM is capable of identifying biologically meaningful DE. Our simulation studies also show that DE tests succeed well in controlling the false discovery rate at its nominal level, while maintaining good sensitivity as compared to competing methods.


2015 ◽  
Author(s):  
Keegan D. Korthauer ◽  
Li-Fang Chu ◽  
Michael A. Newton ◽  
Yuan Li ◽  
James Thomson ◽  
...  

AbstractThe ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. Although understanding such heterogeneity is of primary interest in a number of studies, for convenience, statistical methods often treat cellular heterogeneity as a nuisance factor. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. Using simulated and case study data, we demonstrate that the modeling framework is able to detect differential expression patterns of interest under a wide range of settings. Compared to existing approaches, scDD has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and is able to characterize those differences. The freely available R package scDD implements the approach.


Sign in / Sign up

Export Citation Format

Share Document