scholarly journals scCODA is a Bayesian model for compositional single-cell data analysis

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
C. L. Müller ◽  
F. J. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.

2020 ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
CL. Müller ◽  
FJ. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance and identified experimentally verified cell type changes that were missed in original analyses.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Alexander J Tarashansky ◽  
Jacob M Musser ◽  
Margarita Khariton ◽  
Pengyang Li ◽  
Detlev Arendt ◽  
...  

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.


2020 ◽  
Vol 36 (11) ◽  
pp. 3585-3587
Author(s):  
Lin Wang ◽  
Francisca Catalan ◽  
Karin Shamardani ◽  
Husam Babikir ◽  
Aaron Diaz

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 3 (1) ◽  
pp. 46 ◽  
Author(s):  
Elham Azizi ◽  
Sandhya Prabhakaran ◽  
Ambrose Carr ◽  
Dana Pe'er

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.


2019 ◽  
Author(s):  
Anna Klimovskaia ◽  
David Lopez-Paz ◽  
Léon Bottou ◽  
Maximilian Nickel

AbstractThe need to understand cell developmental processes spawned a plethora of computational methods for discovering hierarchies from scRNAseq data. However, existing techniques are based on Euclidean geometry, a suboptimal choice for modeling complex cell trajectories with multiple branches. To overcome this fundamental representation issue we propose Poincaré maps, a method that harness the power of hyperbolic geometry into the realm of single-cell data analysis. Often understood as a continuous extension of trees, hyperbolic geometry enables the embedding of complex hierarchical data in only two dimensions while preserving the pairwise distances between points in the hierarchy. This enables direct exploratory analysis and the use of our embeddings in a wide variety of downstream data analysis tasks, such as visualization, clustering, lineage detection and pseudo-time inference. When compared to existing methods —unable to address all these important tasks using a single embedding— Poincaré maps produce state-of-the-art two-dimensional representations of cell trajectories on multiple scRNAseq datasets. More specifically, we demonstrate that Poincaré maps allow in a straightforward manner to formulate new hypotheses about biological processes unbeknown to prior methods.Significance statementThe discovery of hierarchies in biological processes is central to developmental biology. We propose Poincaré maps, a new method based on hyperbolic geometry to discover continuous hierarchies from pairwise similarities. We demonstrate the efficacy of our method on multiple single-cell datasets on tasks such as visualization, clustering, lineage identification, and pseudo-time inference.


2021 ◽  
Author(s):  
Wancen Mu ◽  
Hirak Sarkar ◽  
Avi Srivastava ◽  
Kwangbom Choi ◽  
Rob Patro ◽  
...  

Motivation: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified differential AI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability: The airpart package is available as a R/Bioconductor package at https://bioconductor.org/packages/airpart.


Author(s):  
Michael A. Skinnider ◽  
Jordan W. Squair ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Matthieu Gautier ◽  
...  

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Alma Andersson ◽  
Joseph Bergenstråhle ◽  
Michaela Asp ◽  
Ludvig Bergenstråhle ◽  
Aleksandra Jurek ◽  
...  

Abstract The field of spatial transcriptomics is rapidly expanding, and with it the repertoire of available technologies. However, several of the transcriptome-wide spatial assays do not operate on a single cell level, but rather produce data comprised of contributions from a – potentially heterogeneous – mixture of cells. Still, these techniques are attractive to use when examining complex tissue specimens with diverse cell populations, where complete expression profiles are required to properly capture their richness. Motivated by an interest to put gene expression into context and delineate the spatial arrangement of cell types within a tissue, we here present a model-based probabilistic method that uses single cell data to deconvolve the cell mixtures in spatial data. To illustrate the capacity of our method, we use data from different experimental platforms and spatially map cell types from the mouse brain and developmental heart, which arrange as expected.


2020 ◽  
Author(s):  
Murat Can Çobanoğlu

AbstractOne of the key challenges in single-cell data analysis is the annotation of cells with their cell types. This task is divided into two different sub-tasks: identifying known cell types and identifying novel cell types. In the former case, we can benefit from being able to transfer annotations from bulk RNA-seq because there are many more types profiled with that more established technology. In the latter case, we would benefit from interpretable models that can describe the reasons for grouping a number of cells together. We propose that both of these problems can be solved by generative Bayesian Dirichlet-multinomial models. In the supervised learning context, we propose a generative Bayesian Dirichlet-multinomial classifier. We show that such a classifier can effectively transfer cell labels from bulk to single-cell RNA-sequencing data. We also show that alternative well-established machine learning models have difficulty with this transition, even if they are effective within the same regime (i.e. single cell to single cell). In the unsupervised learning context, we propose a Bayesian Dirichlet-multinomial mixture model. We show that the proposed model learns meaningful clusters where the automatically learned relationships between cell types and genes overlap with ground truth associations. Furthermore, there are no density or connectivity based clustering assumptions in this model, which differs with almost every approach in this field. Consequently the clustering results from the generative method can effectively represent nuanced differences among [email protected]


2021 ◽  
Author(s):  
Angeles Arzalluz-Luque ◽  
Pedro Salguero ◽  
Sonia Tarazona ◽  
Ana Conesa

Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we developed and validated percentile correlations, a novel approach that overcomes data sparsity and yields accurate co-expression estimates from single-cell data. Next, acorde uses correlations to cluster co-expressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across neural cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs and domains potentially controlled by the coordination of post-transcriptional regulation.


Sign in / Sign up

Export Citation Format

Share Document