IKAP—Identifying K mAjor cell Population groups in single-cell RNA-sequencing analysis

Yun-Ching Chen; Abhilash Suresh; Chingiz Underbayev; Clare Sun; Komudi Singh; Fayaz Seifuddin; Adrian Wiestner; Mehdi Pirooznia

doi:10.1093/gigascience/giz121

IKAP—Identifying K mAjor cell Population groups in single-cell RNA-sequencing analysis

GigaScience ◽

10.1093/gigascience/giz121 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 2

Author(s):

Yun-Ching Chen ◽

Abhilash Suresh ◽

Chingiz Underbayev ◽

Clare Sun ◽

Komudi Singh ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Sequencing Analysis ◽

Sequencing Data ◽

Peripheral Blood Mononuclear ◽

Biologically Relevant ◽

Single Cell Rna Sequencing ◽

Cell Groups ◽

Cell Ontology

AbstractBackgroundIn single-cell RNA-sequencing analysis, clustering cells into groups and differentiating cell groups by differentially expressed (DE) genes are 2 separate steps for investigating cell identity. However, the ability to differentiate between cell groups could be affected by clustering. This interdependency often creates a bottleneck in the analysis pipeline, requiring researchers to repeat these 2 steps multiple times by setting different clustering parameters to identify a set of cell groups that are more differentiated and biologically relevant.FindingsTo accelerate this process, we have developed IKAP—an algorithm to identify major cell groups and improve differentiating cell groups by systematically tuning parameters for clustering. We demonstrate that, with default parameters, IKAP successfully identifies major cell types such as T cells, B cells, natural killer cells, and monocytes in 2 peripheral blood mononuclear cell datasets and recovers major cell types in a previously published mouse cortex dataset. These major cell groups identified by IKAP present more distinguishing DE genes compared with cell groups generated by different combinations of clustering parameters. We further show that cell subtypes can be identified by recursively applying IKAP within identified major cell types, thereby delineating cell identities in a multi-layered ontology.ConclusionsBy tuning the clustering parameters to identify major cell groups, IKAP greatly improves the automation of single-cell RNA-sequencing analysis to produce distinguishing DE genes and refine cell ontology using single-cell RNA-sequencing data.

Download Full-text

Self-assembling Manifolds in Single-cell RNA Sequencing Data

10.1101/364166 ◽

2018 ◽

Cited By ~ 3

Author(s):

Alexander J. Tarashansky ◽

Yuan Xue ◽

Pengyang Li ◽

Stephen R. Quake ◽

Bo Wang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Developmental Trajectories ◽

Cell Types ◽

Selection Strategy ◽

Sequencing Data ◽

Biologically Relevant ◽

Self Assembling ◽

Single Cell Rna Sequencing ◽

Stem Cell Populations

AbstractSingle-cell RNA sequencing has spurred the development of computational methods that enable researchers to classify cell types, delineate developmental trajectories, and measure molecular responses to external perturbations. Many of these technologies rely on their ability to detect genes whose cell-to-cell variations arise from the biological processes of interest rather than transcriptional or technical noise. However, for datasets in which the biologically relevant differences between cells are subtle, identifying these genes is a challenging task. We present the self-assembling manifold (SAM) algorithm, an iterative soft feature selection strategy to quantify gene relevance and improve dimensionality reduction. We demonstrate its advantages over other state-of-the-art methods with experimental validation in identifying novel stem cell populations of Schistosoma, a prevalent parasite that infects hundreds of millions of people. Extending our analysis to a total of 56 datasets, we show that SAM is generalizable and consistently outperforms other methods in a variety of biological and quantitative benchmarks.

Download Full-text

Self-assembling manifolds in single-cell RNA sequencing data

eLife ◽

10.7554/elife.48994 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 8

Author(s):

Alexander J Tarashansky ◽

Yuan Xue ◽

Pengyang Li ◽

Stephen R Quake ◽

Bo Wang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Developmental Trajectories ◽

Cell Types ◽

Selection Strategy ◽

Sequencing Data ◽

Biologically Relevant ◽

Self Assembling ◽

Single Cell Rna Sequencing ◽

Stem Cell Populations

Single-cell RNA sequencing has spurred the development of computational methods that enable researchers to classify cell types, delineate developmental trajectories, and measure molecular responses to external perturbations. Many of these technologies rely on their ability to detect genes whose cell-to-cell variations arise from the biological processes of interest rather than transcriptional or technical noise. However, for datasets in which the biologically relevant differences between cells are subtle, identifying these genes is challenging. We present the self-assembling manifold (SAM) algorithm, an iterative soft feature selection strategy to quantify gene relevance and improve dimensionality reduction. We demonstrate its advantages over other state-of-the-art methods with experimental validation in identifying novel stem cell populations of Schistosoma mansoni, a prevalent parasite that infects hundreds of millions of people. Extending our analysis to a total of 56 datasets, we show that SAM is generalizable and consistently outperforms other methods in a variety of biological and quantitative benchmarks.

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Single-Cell RNA Sequencing Analysis of the Immunometabolic Rewiring and Immunopathogenesis of Coronavirus Disease 2019

Frontiers in Immunology ◽

10.3389/fimmu.2021.651656 ◽

2021 ◽

Vol 12 ◽

Author(s):

Furong Qi ◽

Wenbo Zhang ◽

Jialu Huang ◽

Lili Fu ◽

Jinfang Zhao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Mononuclear Cells ◽

Plasma Cells ◽

Severe Disease ◽

Sequencing Analysis ◽

Sequencing Data ◽

Metabolic Remodeling ◽

Single Cell Rna Sequencing ◽

Antibody Secretion

Although immune dysfunction is a key feature of coronavirus disease 2019 (COVID-19), the metabolism-related mechanisms remain elusive. Here, by reanalyzing single-cell RNA sequencing data, we delineated metabolic remodeling in peripheral blood mononuclear cells (PBMCs) to elucidate the metabolic mechanisms that may lead to the progression of severe COVID-19. After scoring the metabolism-related biological processes and signaling pathways, we found that mono-CD14+ cells expressed higher levels of glycolysis-related genes (PKM, LDHA and PKM) and PPP-related genes (PGD and TKT) in severe patients than in mild patients. These genes may contribute to the hyperinflammation in mono-CD14+ cells of patients with severe COVID-19. The mono-CD16+ cell population in COVID-19 patients showed reduced transcription levels of genes related to lysine degradation (NSD1, KMT2E, and SETD2) and elevated transcription levels of genes involved in OXPHOS (ATP6V1B2, ATP5A1, ATP5E, and ATP5B), which may inhibit M2-like polarization. Plasma cells also expressed higher levels of the OXPHOS gene ATP13A3 in COVID-19 patients, which was positively associated with antibody secretion and survival of PCs. Moreover, enhanced glycolysis or OXPHOS was positively associated with the differentiation of memory B cells into plasmablasts or plasma cells. This study comprehensively investigated the metabolic features of peripheral immune cells and revealed that metabolic changes exacerbated inflammation in monocytes and promoted antibody secretion and cell survival in PCs in COVID-19 patients, especially those with severe disease.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data

10.1101/234872 ◽

2018 ◽

Cited By ~ 7

Author(s):

Aaron T. L. Lun ◽

Samantha Riesenfeld ◽

Tallulah Andrews ◽

Tomas Gomes ◽

John C. Marioni ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Sequencing Data ◽

Minimum Threshold ◽

False Discovery ◽

Distinct Cell ◽

Single Cell Rna Sequencing ◽

Unique Molecular Identifier

AbstractDroplet-based single-cell RNA sequencing protocols have dramatically increased the throughput and efficiency of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Existing methods for cell calling set a minimum threshold on the total unique molecular identifier (UMI) count for each library, which indiscriminately discards cell libraries with low UMI counts. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that our method has greater power than existing approaches for detecting cell libraries with low UMI counts, while controlling the false discovery rate among detected cells. We also apply our method to real data, where we show that the use of our method results in the retention of distinct cell types that would otherwise have been discarded.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

FEM: mining biological meaning from cell level in single-cell RNA sequencing data

PeerJ ◽

10.7717/peerj.12570 ◽

2021 ◽

Vol 9 ◽

pp. e12570

Author(s):

Yunqing Liu ◽

Na Lu ◽

Changwei Bi ◽

Tingyu Han ◽

Guo Zhuojun ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Mononuclear Cells ◽

Differentially Expressed Gene ◽

Biological Significance ◽

Human Pancreas ◽

Sequencing Data ◽

Peripheral Blood Mononuclear ◽

Single Cell Rna Sequencing ◽

Expression Matrix

Background One goal of expression data analysis is to discover the biological significance or function of genes that are differentially expressed. Gene Set Enrichment (GSE) analysis is one of the main tools for function mining that has been widely used. However, every gene expressed in a cell is valuable information for GSE for single-cell RNA sequencing (scRNA-SEQ) data and not should be discarded. Methods We developed the functional expression matrix (FEM) algorithm to utilize the information from all expressed genes. The algorithm converts the gene expression matrix (GEM) into a FEM. The FEM algorithm can provide insight on the biological significance of a single cell. It can also integrate with GEM for downstream analysis. Results We found that FEM performed well with cell clustering and cell-type specific function annotation in three datasets (peripheral blood mononuclear cells, human liver, and human pancreas).

Download Full-text

Scalable full-transcript coverage single cell RNA sequencing with Smart-seq3xpress

10.1101/2021.07.10.451889 ◽

2021 ◽

Author(s):

Michael Hagemann-Jensen ◽

Christoph Ziegenhain ◽

Rickard Sandberg

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Peripheral Blood Mononuclear Cells ◽

Mononuclear Cells ◽

Human Peripheral Blood ◽

Cell Types ◽

Peripheral Blood Mononuclear ◽

Single Cell Rna Sequencing ◽

Blood Mononuclear Cells ◽

Additional Protein

Plate-based single-cell RNA-sequencing methods with full-transcript coverage typically excel at sensitivity but are more resource and time-consuming. Here, we miniaturized and streamlined the Smart-seq3 protocol for drastically reduced cost and increased throughput. Applying Smart-seq3xpress to 16,349 human peripheral blood mononuclear cells revealed a highly granular atlas complete with both common and rare cell types whose identification previously relied on additional protein measurements or the integration with a reference atlas.

Download Full-text