STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

Mapping Intimacies ◽

10.1101/2020.06.15.152306 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J. Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Batch Effects ◽

Link Type ◽

Transcriptomics Data ◽

Public Repositories ◽

Cell Data

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo

Download Full-text

STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa755 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Source Code ◽

Cell Types ◽

R Package ◽

Computational Method ◽

Biological Variability ◽

Rna Seq ◽

Batch Effects ◽

Guide Trees

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.

Download Full-text

dropClust2: An R package for resource efficient analysis of large scale single cell RNA-Seq data

10.1101/596924 ◽

2019 ◽

Author(s):

Debajyoti Sinha ◽

Pradyumn Sinha ◽

Ritwik Saha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Principal Component ◽

Cell Types ◽

R Package ◽

Locality Sensitive Hashing ◽

Rna Seq ◽

Link Type ◽

Component Selection

ABSTRACTDropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. It makes ingenious use of structure persevering sampling and modality based principal component selection to rescue minor cell types. Existing implementation of dropClust involves interfacing with multiple programming languagesviz. R, python and C, hindering seamless installation and portability. Here we present dropClust2, a complete R package that’s not only fast but also minimally resource intensive. DropClust2 features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets.Availability and implementationdropClust2 is freely available athttps://debsinha.shinyapps.io/dropClust/as an online web service and athttps://github.com/debsin/dropClustas an R package.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench

10.1101/2020.05.22.111211 ◽

2020 ◽

Author(s):

Ruben Chazarra-Gil ◽

Stijn van Dongen ◽

Vladimir Yu Kiselev ◽

Martin Hemberg

Keyword(s):

Single Cell ◽

Computational Methods ◽

Rna Seq ◽

Batch Effects ◽

Systematic Comparison ◽

Batch Correction ◽

Link Type ◽

Biological Signals ◽

The Cost

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.

Download Full-text

VeloViz: RNA-velocity informed 2D embeddings for visualizing cellular trajectories

10.1101/2021.01.28.425293 ◽

2021 ◽

Author(s):

Lyla Atta ◽

Jean Fan

Keyword(s):

Single Cell ◽

Principal Components ◽

R Package ◽

Velocity Analysis ◽

Cell State ◽

Link Type ◽

Transcriptomics Data ◽

Reliable Representation ◽

State Changes

0AbstractRNA velocity analysis can predict cell state changes from single cell transcriptomics data. To interpret these cell state changes as part of underlying cellular trajectories, current approaches rely on visualization with 2D embeddings derived from principal components, t-distributed stochastic neighbor embedding, among others. However, these 2D embeddings can yield different representations of the underlying trajectories, hindering the interpretation of cell state changes. To address this challenge, we developed VeloViz to create RNA-velocity-informed 2D embeddings. We show that by taking into consideration the predicted future transcriptional states from RNA velocity analysis, VeloViz can help ensure a more reliable representation of underlying cellular trajectories. VeloViz is available as an R package at https://github.com/JEFworks-Lab/veloviz.

Download Full-text

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

10.1101/2021.05.28.446161 ◽

2021 ◽

Author(s):

Daniel Osorio ◽

Marieke Lydia Kuijjer ◽

James J. Cai

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Single Experiment ◽

Tissue Samples ◽

Molecular Phenotypes ◽

Public Datasets

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

Download Full-text

Supervised Adversarial Alignment of Single-Cell RNA-seq Data

10.1101/2020.01.06.896621 ◽

2020 ◽

Author(s):

Songwei Ge ◽

Haohan Wang ◽

Amir Alavi ◽

Eric Xing ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Reduced Representation ◽

Type Assignment ◽

Cell Type Specific ◽

Reduced Dimension ◽

Adversarial Model

AbstractDimensionality reduction is an important first step in the analysis of single cell RNA-seq (scRNA-seq) data. In addition to enabling the visualization of the profiled cells, such representations are used by many downstream analyses methods ranging from pseudo-time reconstruction to clustering to alignment of scRNA-seq data from different experiments, platforms, and labs. Both supervised and unsupervised methods have been proposed to reduce the dimension of scRNA-seq. However, all methods to date are sensitive to batch effects. When batches correlate with cell types, as is often the case, their impact can lead to representations that are batch rather than cell type specific. To overcome this we developed a domain adversarial neural network model for learning a reduced dimension representation of scRNA-seq data. The adversarial model tries to simultaneously optimize two objectives. The first is the accuracy of cell type assignment and the second is the inability to distinguish the batch (domain). We tested the method by using the resulting representation to align several different datasets. As we show, by overcoming batch effects our method was able to correctly separate cell types, improving on several prior methods suggested for this task. Analysis of the top features used by the network indicates that by taking the batch impact into account, the reduced representation is much better able to focus on key genes for each cell type.

Download Full-text

A descriptive marker gene approach to single-cell pseudotime inference

10.1101/060442 ◽

2016 ◽

Cited By ~ 5

Author(s):

Kieran R Campbell ◽

Christopher Yau

Keyword(s):

Single Cell ◽

Marker Gene ◽

Cell Types ◽

R Package ◽

Estimation Methods ◽

Marker Genes ◽

Peak Time ◽

Transient Behaviour ◽

Link Type ◽

Cell Gene Expression

AbstractPseudotime estimation from single-cell gene expression allows the recovery of temporal information from otherwise static profiles of individual cells. This pseudotemporal information can be used to characterise transient events in temporally evolving biological systems. Conventional algorithms typically emphasise an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. Here we introduce an orthogonal approach termed “Ouija” that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. In the following we introduce our model and demonstrate that in many instances a small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify “metastable” states - discrete cell types along the continuous trajectories - that recapitulate known cell types. Ouija therefore provides a powerful complimentary approach to existing whole transcriptome based pseudotime estimation methods. An open source implementation is available at http://www.github.com/kieranrcampbell/ouija as an R package and at http://www.github.com/kieranrcampbell/ouijaflow as a Python/TensorFlow package.

Download Full-text

Whole Animal Multiplexed Single-Cell RNA-Seq Reveals Plasticity of Clytia Medusa Cell Types

10.1101/2021.01.22.427844 ◽

2021 ◽

Author(s):

Tara Chari ◽

Brandon Weissbourd ◽

Jase Gehring ◽

Anna Ferraioli ◽

Lucas Leclère ◽

...

Keyword(s):

High Resolution ◽

Single Cell ◽

Cell Types ◽

Model Organisms ◽

Traditional Model ◽

Rna Seq ◽

Batch Effects ◽

Future Studies ◽

Component Cell ◽

Jellyfish Species

AbstractWe present an organism-wide, transcriptomic cell atlas of the hydrozoan medusa Clytia hemisphaerica, and determine how its component cell types respond to starvation. Utilizing multiplexed scRNA-seq, in which individual animals were indexed and pooled from control and perturbation conditions into a single sequencing run, we avoid artifacts from batch effects and are able to discern shifts in cell state in response to organismal perturbations. This work serves as a foundation for future studies of development, function, and plasticity in a genetically tractable jellyfish species. Moreover, we introduce a powerful workflow for high-resolution, whole animal, multiplexed single-cell genomics (WHAM-seq) that is readily adaptable to other traditional or non-traditional model organisms.

Download Full-text

Sincell: Bioconductor package for the statistical assessment of cell-state hierarchies from single-cell RNA-seq data

10.1101/014472 ◽

2015 ◽

Author(s):

Miguel Juliá ◽

Amalio Telenti ◽

Antonio Rausell

Keyword(s):

Single Cell ◽

R Package ◽

Intermediate Cell ◽

Bioconductor Package ◽

Rna Seq ◽

Cell State ◽

Describing Functions ◽

Statistical Support ◽

Cell Data

Summary: Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general framework composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algo-rithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies. Sincell functionalities are illustrated in a real case study where its ability to discriminate noisy from stable cell-state hierarchies is demonstrated. Availability and implementation: Sincell is an open-source R/Bioconductor package available at http://bioconductor.org/packages/3.1/bioc/html/sincell.html. A detailed vignette describing functions and workflows is provided with the package.

Download Full-text