SCSA: A Cell Type Annotation Tool for Single-Cell RNA-seq Data

AbstractCurrently most methods take manual strategies to annotate cell types after clustering the single-cell RNA sequencing (scRNA-seq) data. Such methods are labor-intensive and heavily rely on user expertise, which may lead to inconsistent results. We present SCSA, an automatic tool to annotate cell types from scRNA-seq data, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Evaluation on real scRNA-seq datasets from different sources with other methods shows that SCSA is able to assign the cells into the correct types at a fully automated mode with a desirable precision.

Download Full-text

Discovering Novel Cell Types across Heterogeneous Single-cell Experiments

10.1101/2020.02.25.960302 ◽

2020 ◽

Cited By ~ 3

Author(s):

Maria Brbić ◽

Marinka Zitnik ◽

Sheng Wang ◽

Angela O. Pisco ◽

Russ B. Altman ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Learning To Learn ◽

Multiple Datasets ◽

Temporal Relationships ◽

Meta Learning ◽

A Cell ◽

Cell Type Specific

Although tremendous effort has been put into cell type annotation and classification, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as novel cell types. MARS overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets. MARS uses deep learning to learn a cell embedding function as well as a set of landmarks in the cell embedding space. The method annotates cells by probabilistically defining a cell type based on nearest landmarks in the embedding space. MARS has a unique ability to discover cell types that have never been seen before and annotate experiments that are yet unannotated. We apply MARS to a large aging cell atlas of 23 tissues covering the life span of a mouse. MARS accurately identifies cell types, even when it has never seen them before. Further, the method automatically generates interpretable names for novel cell types. Remarkably, MARS estimates meaningful cell-type-specific signatures of aging and visualizes them as trajectories reflecting temporal relationships of cells in a tissue.

Download Full-text

Single-cell entropy to quantify the cellular transcriptome from single-cell RNA-seq data

10.1101/678557 ◽

2019 ◽

Author(s):

Jingxin Liu ◽

You Song ◽

Jinzhi Lei

Keyword(s):

Single Cell ◽

Gaussian Mixture ◽

Rna Seq ◽

Cell Type ◽

Transcriptome Profile ◽

Biological Interpretation ◽

A Cell ◽

Transcriptional State ◽

Type Classification ◽

Cellular Transcriptome

We present the use of single-cell entropy (scEntropy) to measure the order of the cellular transcriptome profile from single-cell RNA-seq data, which leads to a method of unsupervised cell type classification through scEntropy followed by the Gaussian mixture model (scEGMM). scEntropy is straightforward in defining an intrinsic transcriptional state of a cell. scEGMM is a coherent method of cell type classification that includes no parameters and no clustering; however, it is comparable to existing machine learning-based methods in benchmarking studies and facilitates biological interpretation.

Download Full-text

CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq

Frontiers in Genetics ◽

10.3389/fgene.2021.644211 ◽

2021 ◽

Vol 12 ◽

Author(s):

Wenbo Yu ◽

Ahmed Mahfouz ◽

Marcel J. T. Reinders

Keyword(s):

Single Cell ◽

Developmental Process ◽

Mouse Cell ◽

Cell Alignment ◽

Rna Seq ◽

Cell Type ◽

Expression Of Genes ◽

A Cell ◽

Differential Expression Of Genes ◽

Cell Cell

The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.

Download Full-text

CellMixS: quantifying and visualizing batch effects in single-cell RNA-seq data

Life Science Alliance ◽

10.26508/lsa.202001004 ◽

2021 ◽

Vol 4 (6) ◽

pp. e202001004

Author(s):

Almut Lütge ◽

Joanna Zyprych-Walczak ◽

Urszula Brykczynska Kunzmann ◽

Helena L Crowell ◽

Daniela Calini ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Cell Type Specificity ◽

Distance Distributions ◽

A Cell ◽

Cell Type Specific ◽

Synthetic Datasets

A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, we systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. We developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. We compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, we find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. We find that cell-specific metrics outperform cell type–specific and global metrics and recommend them for both method benchmarks and batch exploration.

Download Full-text

Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2020.07.004 ◽

2020 ◽

Author(s):

Qianhui Huang ◽

Yu Liu ◽

Yuheng Du ◽

Lana X. Garmire

Keyword(s):

Single Cell ◽

Rna Seq ◽

Cell Type ◽

R Packages

Download Full-text

Cell type diversity in scallop adductor muscles revealed by single-cell RNA-Seq

Genomics ◽

10.1016/j.ygeno.2021.08.015 ◽

2021 ◽

Vol 113 (6) ◽

pp. 3582-3598

Author(s):

Xiujun Sun ◽

Li Li ◽

Biao Wu ◽

Jianlong Ge ◽

Yanxin Zheng ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

Cell Type ◽

Adductor Muscles

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

ESCO: single cell expression simulation incorporating gene co-expression

10.1101/2020.10.20.347211 ◽

2020 ◽

Author(s):

Jinjin Tian ◽

Jiebiao Wang ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

R Package ◽

Brain Cell ◽

Gene Interactions ◽

Cell Type ◽

Imputation Methods ◽

Biological Interest ◽

A Cell ◽

Cell Expression ◽

Cell Data

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]

Download Full-text