A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data

Abstract Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

F1000Research ◽

10.12688/f1000research.18490.2 ◽

2019 ◽

Vol 8 ◽

pp. 296 ◽

Cited By ~ 1

Author(s):

J. Javier Diaz-Mejia ◽

Elaine C. Meng ◽

Alexander R. Pico ◽

Sonya A. MacParland ◽

Troy Ketela ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Human Peripheral Blood ◽

Characteristic Curve ◽

Cell Type ◽

Sequencing Data ◽

Cell Clusters ◽

Reference Cell ◽

Mouse Tissues ◽

Single Cell Rna Sequencing

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Download Full-text

SciBet: a portable and fast single cell type identifier

10.1101/645358 ◽

2019 ◽

Cited By ~ 2

Author(s):

Chenwei Li ◽

Baolin Liu ◽

Boxi Kang ◽

Zedao Liu ◽

Yedan Liu ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Sequencing Data ◽

Local Computation ◽

Local Data ◽

Single Cell Rna Sequencing ◽

Cross Platform ◽

Single Cell Type ◽

User Friendly

ABSTRACTFast, robust and technology-independent computational methods are needed for supervised cell type annotation of single-cell RNA sequencing data. We present SciBet, a Bayesian classifier that accurately predicts cell identity for newly sequenced cells or cell clusters. We enable web client deployment of SciBet for rapid local computation without uploading local data to the server. This user-friendly and cross-platform tool can be widely useful for single cell type identification.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

10.1101/2020.04.22.056473 ◽

2020 ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Data Clusters

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data

Nature Machine Intelligence ◽

10.1038/s42256-020-00244-4 ◽

2020 ◽

Vol 2 (11) ◽

pp. 693-703 ◽

Cited By ~ 1

Author(s):

Lifei Wang ◽

Rui Nie ◽

Zeyang Yu ◽

Ruyue Xin ◽

Caihong Zheng ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Type Gene

Download Full-text

CellBIC: bimodality-based top-down clustering of single-cell RNA sequencing data reveals hierarchical structure of the cell type

Nucleic Acids Research ◽

10.1093/nar/gky698 ◽

2018 ◽

Vol 46 (21) ◽

pp. e124-e124 ◽

Cited By ~ 6

Author(s):

Junil Kim ◽

Diana E Stanescu ◽

Kyoung Jae Won

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Hierarchical Structure ◽

Cell Type ◽

Sequencing Data ◽

Top Down ◽

Single Cell Rna Sequencing

Download Full-text

SCMarker: ab initio marker selection for single cell transcriptome profiling

10.1101/356634 ◽

2018 ◽

Author(s):

Fang Wang ◽

Shaoheng Liang ◽

Tapsi Kumar ◽

Nicholas Navin ◽

Ken Chen

Keyword(s):

Ab Initio ◽

Single Cell ◽

Rna Sequencing ◽

Mrna Transcript ◽

Cell Type ◽

Sequencing Data ◽

Transcript Levels ◽

Expression Levels ◽

Marker Selection ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA-sequencing data generated by a variety of technologies, such as Drop-seq and SMART-seq, can reveal simultaneously the mRNA transcript levels of thousands of genes in thousands of cells. It is often important to identify informative genes or cell-type-discriminative markers to reduce dimensionality and achieve informative cell typing results. We present an ab initio method that performs unsupervised marker selection by identifying genes that have subpopulation-discriminative expression levels and are co- or mutually-exclusively expressed with other genes. Consistent improvements in cell-type classification and biologically meaningful marker selection are achieved by applying SCMarker on various datasets in multiple tissue types, followed by a variety of clustering algorithms. The source code of SCMarker is publicly available at https://github.com/KChen-lab/SCMarker.Author SummarySingle cell RNA-sequencing technology simultaneously provides the mRNA transcript levels of thousands of genes in thousands of cells. A frequent requirement of single cell expression analysis is the identification of markers which may explain complex cellular states or tissue composition. We propose a new marker selection strategy (SCMarker) to accurately delineate cell types in single cell RNA-sequencing data by identifying genes that have bi/multi-modally distributed expression levels and are co- or mutually-exclusively expressed with some other genes. Our method can determine the cell-type-discriminative markers without referencing to any known transcriptomic profiles or cell ontologies, and consistently achieves accurate cell-type-discriminative marker identification in a variety of scRNA-seq datasets.

Download Full-text

A monotonicity-based gene clustering algorithm for enhancing clarity in single-cell RNA sequencing data

10.1101/2020.12.20.423308 ◽

2020 ◽

Author(s):

Victor Wang ◽

Pietro Antonio Cicalese ◽

Chandra Mohan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Expression Patterns ◽

Gene Clustering ◽

Sequencing Data ◽

Technical Noise ◽

Cell Clustering ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies and analysis tools have allowed for meaningful insight into the roles and relationships of cells. However, high dimensionality, frequent dropout values, and technical noise remain prevalent challenges for scRNA-seq data, obscuring the already complex expression patterns. To address several shortcomings in commonly used distance metrics, we present a monotonicity-based distance metric designed to enhance the clarity of scRNA-seq data. We apply our metric in a gene clustering algorithm, which we run on several biological datasets. We compare our results to those generated by popular clustering algorithms to demonstrate that our algorithm has substantial ability to improve the accuracy of subsequent cell clustering.

Download Full-text

Integration for single-cell RNA sequencing data based on the shared cell type assignment

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm49941.2020.9313460 ◽

2020 ◽

Author(s):

Yin Zhang ◽

Fei Wang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Sequencing Data ◽

Type Assignment ◽

Single Cell Rna Sequencing

Download Full-text