scCODA is a Bayesian model for compositional single-cell data analysis

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.

Download Full-text

scCODA: A Bayesian model for compositional single-cell data analysis

10.1101/2020.12.14.422688 ◽

2020 ◽

Author(s):

M. Büttner ◽

J. Ostner ◽

CL. Müller ◽

FJ. Theis ◽

B. Schubert

Keyword(s):

Data Analysis ◽

Single Cell ◽

Bayesian Model ◽

Cell Types ◽

Detection Performance ◽

Biological Processes ◽

Complex Cell ◽

Cell Type ◽

Compositional Changes ◽

Cell Data

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance and identified experimentally verified cell type changes that were missed in original analyses.

Download Full-text

Mapping single-cell atlases throughout Metazoa unravels cell type evolution

eLife ◽

10.7554/elife.66747 ◽

2021 ◽

Vol 10 ◽

Author(s):

Alexander J Tarashansky ◽

Jacob M Musser ◽

Margarita Khariton ◽

Pengyang Li ◽

Detlev Arendt ◽

...

Keyword(s):

Stem Cell ◽

Single Cell ◽

Cell Types ◽

The Self ◽

Cell Type ◽

Germ Layers ◽

Animal Evolution ◽

Self Assembling ◽

Animal Phyla ◽

Cell Data

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.

Download Full-text

Ensemble learning for classifying single-cell data and projection across reference atlases

Bioinformatics ◽

10.1093/bioinformatics/btaa137 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3585-3587

Author(s):

Lin Wang ◽

Francisca Catalan ◽

Karin Shamardani ◽

Husam Babikir ◽

Aaron Diaz

Keyword(s):

Single Cell ◽

Cell Types ◽

Status Quo ◽

Supplementary Information ◽

Published Data ◽

Supplementary Data ◽

Cell Type ◽

Low Sensitivity ◽

Project Data ◽

Cell Data

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bayesian Inference for Single-cell Clustering and Imputing

Genomics and Computational Biology ◽

10.18547/gcb.2017.vol3.iss1.e46 ◽

2017 ◽

Vol 3 (1) ◽

pp. 46 ◽

Cited By ~ 25

Author(s):

Elham Azizi ◽

Sandhya Prabhakaran ◽

Ambrose Carr ◽

Dana Pe'er

Keyword(s):

Single Cell ◽

Cell Types ◽

Superior Performance ◽

Underlying Structure ◽

Specific Information ◽

Cell Type ◽

Cell Clustering ◽

Bayesian Probabilistic Model ◽

Cell Type Specific ◽

Cell Data

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Download Full-text

Poincaré Maps for Analyzing Complex Hierarchies in Single-Cell Data

10.1101/689547 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anna Klimovskaia ◽

David Lopez-Paz ◽

Léon Bottou ◽

Maximilian Nickel

Keyword(s):

Data Analysis ◽

Single Cell ◽

Hyperbolic Geometry ◽

Continuous Extension ◽

Two Dimensions ◽

Biological Processes ◽

Poincaré Maps ◽

Poincare Maps ◽

Cell Trajectories ◽

Cell Data

AbstractThe need to understand cell developmental processes spawned a plethora of computational methods for discovering hierarchies from scRNAseq data. However, existing techniques are based on Euclidean geometry, a suboptimal choice for modeling complex cell trajectories with multiple branches. To overcome this fundamental representation issue we propose Poincaré maps, a method that harness the power of hyperbolic geometry into the realm of single-cell data analysis. Often understood as a continuous extension of trees, hyperbolic geometry enables the embedding of complex hierarchical data in only two dimensions while preserving the pairwise distances between points in the hierarchy. This enables direct exploratory analysis and the use of our embeddings in a wide variety of downstream data analysis tasks, such as visualization, clustering, lineage detection and pseudo-time inference. When compared to existing methods —unable to address all these important tasks using a single embedding— Poincaré maps produce state-of-the-art two-dimensional representations of cell trajectories on multiple scRNAseq datasets. More specifically, we demonstrate that Poincaré maps allow in a straightforward manner to formulate new hypotheses about biological processes unbeknown to prior methods.Significance statementThe discovery of hierarchies in biological processes is central to developmental biology. We propose Poincaré maps, a new method based on hyperbolic geometry to discover continuous hierarchies from pairwise similarities. We demonstrate the efficacy of our method on multiple single-cell datasets on tasks such as visualization, clustering, lineage identification, and pseudo-time inference.

Download Full-text

Airpart: Interpretable statistical models for analyzing allelic imbalance in single-cell datasets

10.1101/2021.10.15.464546 ◽

2021 ◽

Author(s):

Wancen Mu ◽

Hirak Sarkar ◽

Avi Srivastava ◽

Kwangbom Choi ◽

Rob Patro ◽

...

Keyword(s):

Single Cell ◽

Allelic Imbalance ◽

Genetic Regulation ◽

Real Data ◽

Cell Types ◽

Cell Type ◽

Time Resolved ◽

Bulk Data ◽

Cell Type Specific ◽

Cell Data

Motivation: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified differential AI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability: The airpart package is available as a R/Bioconductor package at https://bioconductor.org/packages/airpart.

Download Full-text

Cell type prioritization in single-cell data

10.1101/2019.12.20.884916 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael A. Skinnider ◽

Jordan W. Squair ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Matthieu Gautier ◽

...

Keyword(s):

Single Cell ◽

Neural Circuits ◽

Cell Types ◽

Chromatin Accessibility ◽

High Dimensional ◽

Machine Learning Method ◽

Learning Method ◽

Rna Seq ◽

Cell Type ◽

Cell Data

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.

Download Full-text

Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography

Communications Biology ◽

10.1038/s42003-020-01247-y ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 4

Author(s):

Alma Andersson ◽

Joseph Bergenstråhle ◽

Michaela Asp ◽

Ludvig Bergenstråhle ◽

Aleksandra Jurek ◽

...

Keyword(s):

Single Cell ◽

Spatial Data ◽

Expression Profiles ◽

Probabilistic Method ◽

Spatial Arrangement ◽

Cell Types ◽

Cell Type ◽

Cell Level ◽

Tissue Specimens ◽

Cell Data

Abstract The field of spatial transcriptomics is rapidly expanding, and with it the repertoire of available technologies. However, several of the transcriptome-wide spatial assays do not operate on a single cell level, but rather produce data comprised of contributions from a – potentially heterogeneous – mixture of cells. Still, these techniques are attractive to use when examining complex tissue specimens with diverse cell populations, where complete expression profiles are required to properly capture their richness. Motivated by an interest to put gene expression into context and delineate the spatial arrangement of cell types within a tissue, we here present a model-based probabilistic method that uses single cell data to deconvolve the cell mixtures in spatial data. To illustrate the capacity of our method, we use data from different experimental platforms and spatially map cell types from the mouse brain and developmental heart, which arrange as expected.

Download Full-text

Fast and interpretable scRNA-seq data analysis

10.1101/2020.10.05.314039 ◽

2020 ◽

Author(s):

Murat Can Çobanoğlu

Keyword(s):

Data Analysis ◽

Single Cell ◽

Ground Truth ◽

Cell Types ◽

Sequencing Data ◽

Learning Context ◽

Multinomial Models ◽

Proposed Model ◽

Interpretable Models ◽

Cell Data

AbstractOne of the key challenges in single-cell data analysis is the annotation of cells with their cell types. This task is divided into two different sub-tasks: identifying known cell types and identifying novel cell types. In the former case, we can benefit from being able to transfer annotations from bulk RNA-seq because there are many more types profiled with that more established technology. In the latter case, we would benefit from interpretable models that can describe the reasons for grouping a number of cells together. We propose that both of these problems can be solved by generative Bayesian Dirichlet-multinomial models. In the supervised learning context, we propose a generative Bayesian Dirichlet-multinomial classifier. We show that such a classifier can effectively transfer cell labels from bulk to single-cell RNA-sequencing data. We also show that alternative well-established machine learning models have difficulty with this transition, even if they are effective within the same regime (i.e. single cell to single cell). In the unsupervised learning context, we propose a Bayesian Dirichlet-multinomial mixture model. We show that the proposed model learns meaningful clusters where the automatically learned relationships between cell types and genes overlap with ground truth associations. Furthermore, there are no density or connectivity based clustering assumptions in this model, which differs with almost every approach in this field. Consequently the clustering results from the generative method can effectively represent nuanced differences among [email protected]

Download Full-text

Acorde: unraveling functionally-interpretable networks of isoform co-usage from single cell data

10.1101/2021.05.07.441841 ◽

2021 ◽

Author(s):

Angeles Arzalluz-Luque ◽

Pedro Salguero ◽

Sonia Tarazona ◽

Ana Conesa

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Type ◽

Novel Approach ◽

Long Reads ◽

Usage Patterns ◽

Long Read ◽

Specific Alternative ◽

Post Transcriptional Regulation ◽

Cell Data

Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we developed and validated percentile correlations, a novel approach that overcomes data sparsity and yields accurate co-expression estimates from single-cell data. Next, acorde uses correlations to cluster co-expressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across neural cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs and domains potentially controlled by the coordination of post-transcriptional regulation.

Download Full-text