MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data

AbstractSingle-cell RNA sequencing (scRNA-seq) provides new opportunities to characterize cell populations, typically accomplished through some type of clustering analysis. Estimation of the optimal cluster number (K) is a crucial step but often ignored. Our approach improves most current scRNA-seq cluster methods by providing an objective estimation of the number of groups using a multi-resolution perspective. MultiK is a tool for objective selection of insightful Ks and achieves high robustness through a consensus clustering approach. We demonstrate that MultiK identifies reproducible groups in scRNA-seq data, thus providing an objective means to estimating the number of possible groups or cell-type populations present.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

10.1101/2020.04.22.056473 ◽

2020 ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Data Clusters

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

Adapted single-cell consensus clustering (adaSC3)

Advances in Data Analysis and Classification ◽

10.1007/s11634-020-00428-1 ◽

2020 ◽

Author(s):

Cornelia Fuetterer ◽

Thomas Augustin ◽

Christiane Fuchs

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Linear Method ◽

Principal Component ◽

Data Sets ◽

Consensus Clustering ◽

Sequencing Data ◽

Reduction Techniques ◽

Single Cell Rna Sequencing

AbstractThe analysis of single-cell RNA sequencing data is of great importance in health research. It challenges data scientists, but has enormous potential in the context of personalized medicine. The clustering of single cells aims to detect different subgroups of cell populations within a patient in a data-driven manner. Some comparison studies denote single-cell consensus clustering (SC3), proposed by Kiselev et al. (Nat Methods 14(5):483–486, 2017), as the best method for classifying single-cell RNA sequencing data. SC3 includes Laplacian eigenmaps and a principal component analysis (PCA). Our proposal of unsupervised adapted single-cell consensus clustering (adaSC3) suggests to replace the linear PCA by diffusion maps, a non-linear method that takes the transition of single cells into account. We investigate the performance of adaSC3 in terms of accuracy on the data sets of the original source of SC3 as well as in a simulation study. A comparison of adaSC3 with SC3 as well as with related algorithms based on further alternative dimension reduction techniques shows a quite convincing behavior of adaSC3.

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text

IMMU-27. SINGLE CELL RNA-SEQUENCING IDENTIFIES NOVEL BONE MARROW DERIVED MYELOID CELLS IN GLIOBLASTOMA ASSOCIATED WITH TUMOR AGGRESSION

Neuro-Oncology ◽

10.1093/neuonc/noaa215.457 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii110-ii110

Author(s):

Christina Jackson ◽

Christopher Cherry ◽

Sadhana Bom ◽

Hao Zhang ◽

John Choi ◽

...

Keyword(s):

Bone Marrow ◽

Single Cell ◽

Tumor Cells ◽

Rna Sequencing ◽

Metabolic Pathways ◽

Myeloid Cells ◽

Tumor Grade ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Two Populations

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text

Modeling dynamic correlation in zero‐inflated bivariate count data with applications to single‐cell RNA sequencing data

Biometrics ◽

10.1111/biom.13457 ◽

2021 ◽

Author(s):

Zhen Yang ◽

Yen‐Yi Ho

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Count Data ◽

Sequencing Data ◽

Dynamic Correlation ◽

Single Cell Rna Sequencing

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text