Acorde: unraveling functionally-interpretable networks of isoform co-usage from single cell data

Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we developed and validated percentile correlations, a novel approach that overcomes data sparsity and yields accurate co-expression estimates from single-cell data. Next, acorde uses correlations to cluster co-expressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across neural cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs and domains potentially controlled by the coordination of post-transcriptional regulation.

Download Full-text

Single-cell isoform RNA sequencing (ScISOr-Seq) across thousands of cells reveals isoforms of cerebellar cell types

10.1101/364950 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ishaan Gupta ◽

Paul G Collier ◽

Bettina Haase ◽

Ahmed Mahfouz ◽

Anoushka Joglekar ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Cell Types ◽

Full Length ◽

Cell Of Origin ◽

Cell Type ◽

Long Reads ◽

Long Read ◽

The Individual ◽

Bulk Tissue

AbstractFull-length isoform sequencing has advanced our knowledge of isoform biology1–11. However, apart from applying full-length isoform sequencing to very few single cells12,13, isoform sequencing has been limited to bulk tissue, cell lines, or sorted cells. Single splicing events have been described for <=200 single cells with great statistical success14,15, but these methods do not describe full-length mRNAs. Single cell short-read 3’ sequencing has allowed identification of many cell sub-types16–23, but full-length isoforms for these cell types have not been profiled. Using our new method of single-cell-isoform-RNA-sequencing (ScISOr-Seq) we determine isoform-expression in thousands of individual cells from a heterogeneous bulk tissue (cerebellum), without specific antibody-fluorescence activated cell sorting. We elucidate isoform usage in high-level cell types such as neurons, astrocytes and microglia and finer sub-types, such as Purkinje cells and Granule cells, including the combination patterns of distant splice sites6–9,24,25, which for individual molecules requires long reads. We produce an enhanced genome annotation revealing cell-type specific expression of known and 16,872 novel (with respect to mouse Gencode version 10) isoforms (see isoformatlas.com).ScISOr-Seq describes isoforms from >1,000 single cells from bulk tissue without cell sorting by leveraging two technologies in three steps: In step one, we employ microfluidics to produce amplified full-length cDNAs barcoded for their cell of origin. This cDNA is split into two pools: one pool for 3’ sequencing to measure gene expression (step 2) and another pool for long-read sequencing and isoform expression (step 3). In step two, short-read 3’-sequencing provides molecular counts for each gene and cell, which allows clustering cells and assigning a cell type using cell-type specific markers. In step three, an aliquot of the same cDNAs (each barcoded for the individual cell of origin) is sequenced using Pacific Biosciences (“PacBio”)1,2,4,5,26 or Oxford Nanopore3. Since these long reads carry the single-cell barcodes identified in step two, one can determine the individual cell from which each long read originates. Since most single cells are assigned to a named cluster, we can also assign the cell’s cluster name (e.g. “Purkinje cell” or “astrocyte”) to the long read in question (Fig 1A) – without losing the cell of origin of each long read.

Download Full-text

Mapping single-cell atlases throughout Metazoa unravels cell type evolution

eLife ◽

10.7554/elife.66747 ◽

2021 ◽

Vol 10 ◽

Author(s):

Alexander J Tarashansky ◽

Jacob M Musser ◽

Margarita Khariton ◽

Pengyang Li ◽

Detlev Arendt ◽

...

Keyword(s):

Stem Cell ◽

Single Cell ◽

Cell Types ◽

The Self ◽

Cell Type ◽

Germ Layers ◽

Animal Evolution ◽

Self Assembling ◽

Animal Phyla ◽

Cell Data

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.

Download Full-text

Ensemble learning for classifying single-cell data and projection across reference atlases

Bioinformatics ◽

10.1093/bioinformatics/btaa137 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3585-3587

Author(s):

Lin Wang ◽

Francisca Catalan ◽

Karin Shamardani ◽

Husam Babikir ◽

Aaron Diaz

Keyword(s):

Single Cell ◽

Cell Types ◽

Status Quo ◽

Supplementary Information ◽

Published Data ◽

Supplementary Data ◽

Cell Type ◽

Low Sensitivity ◽

Project Data ◽

Cell Data

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bayesian Inference for Single-cell Clustering and Imputing

Genomics and Computational Biology ◽

10.18547/gcb.2017.vol3.iss1.e46 ◽

2017 ◽

Vol 3 (1) ◽

pp. 46 ◽

Cited By ~ 25

Author(s):

Elham Azizi ◽

Sandhya Prabhakaran ◽

Ambrose Carr ◽

Dana Pe'er

Keyword(s):

Single Cell ◽

Cell Types ◽

Superior Performance ◽

Underlying Structure ◽

Specific Information ◽

Cell Type ◽

Cell Clustering ◽

Bayesian Probabilistic Model ◽

Cell Type Specific ◽

Cell Data

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Download Full-text

A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain

Nature Communications ◽

10.1038/s41467-020-20343-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Anoushka Joglekar ◽

Andrey Prjibelski ◽

Ahmed Mahfouz ◽

Paul Collier ◽

Susan Lin ◽

...

Keyword(s):

Single Cell ◽

Brain Region ◽

Cell Types ◽

Brain Regions ◽

Protein Isoforms ◽

Cell Type ◽

Anatomic Structure ◽

Spatially Resolved ◽

Long Read ◽

Isoform Expression

AbstractSplicing varies across brain regions, but the single-cell resolution of regional variation is unclear. We present a single-cell investigation of differential isoform expression (DIE) between brain regions using single-cell long-read sequencing in mouse hippocampus and prefrontal cortex in 45 cell types at postnatal day 7 (www.isoformAtlas.com). Isoform tests for DIE show better performance than exon tests. We detect hundreds of DIE events traceable to cell types, often corresponding to functionally distinct protein isoforms. Mostly, one cell type is responsible for brain-region specific DIE. However, for fewer genes, multiple cell types influence DIE. Thus, regional identity can, although rarely, override cell-type specificity. Cell types indigenous to one anatomic structure display distinctive DIE, e.g. the choroid plexus epithelium manifests distinct transcription-start-site usage. Spatial transcriptomics and long-read sequencing yield a spatially resolved splicing map. Our methods quantify isoform expression with cell-type and spatial resolution and it contributes to further our understanding of how the brain integrates molecular and cellular complexity.

Download Full-text

Cell-type, single-cell, and spatial signatures of brain-region specific splicing in postnatal development

10.1101/2020.08.27.268730 ◽

2020 ◽

Cited By ~ 1

Author(s):

Anoushka Joglekar ◽

Andrey Prjibelski ◽

Ahmed Mahfouz ◽

Paul Collier ◽

Susan Lin ◽

...

Keyword(s):

Single Cell ◽

Brain Region ◽

Cell Types ◽

Brain Regions ◽

Specific Cell ◽

Cell Type ◽

Anatomic Structure ◽

Link Type ◽

Long Read ◽

Isoform Expression

AbstractAlternative RNA splicing varies across brain regions, but the single-cell resolution of such regional variation is unknown. Here we present the first single-cell investigation of differential isoform expression (DIE) between brain regions, by performing single cell long-read transcriptome sequencing in the mouse hippocampus and prefrontal cortex in 45 cell types at postnatal day 7 (www.isoformAtlas.com). Using isoform tests for brain-region specific DIE, which outperform exon-based tests, we detect hundreds of brain-region specific DIE events traceable to specific cell-types. Many DIE events correspond to functionally distinct protein isoforms, some with just a 6-nucleotide exon variant. In most instances, one cell type is responsible for brain-region specific DIE. Cell types indigenous to only one anatomic structure display distinctive DIE, where for example, the choroid plexus epithelium manifest unique transcription start sites. However, for some genes, multiple cell-types are responsible for DIE in bulk data, indicating that regional identity can, although less frequently, override cell-type specificity. We validated our findings with spatial transcriptomics and long-read sequencing, yielding the first spatially resolved splicing map in the postnatal mouse brain (www.isoformAtlas.com). Our methods are highly generalizable. They provide a robust means of quantifying isoform expression with cell-type and spatial resolution, and reveal how the brain integrates molecular and cellular complexity to serve function.

Download Full-text

Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq

Genome Biology ◽

10.1186/s13059-021-02505-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Elisabeth Rebboah ◽

Fairlie Reese ◽

Katherine Williams ◽

Gabriela Balderrama-Gutierrez ◽

Cassandra McGill ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Cell Types ◽

Transcription Start Sites ◽

Design Flexibility ◽

Long Reads ◽

Internal Exon ◽

Long Read ◽

Isoform Expression ◽

Full Length Transcript

AbstractThe rise in throughput and quality of long-read sequencing should allow unambiguous identification of full-length transcript isoforms. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here we develop and characterize long-read Split-seq (LR-Split-seq), which uses combinatorial barcoding to sequence single cells with long reads. Applied to the C2C12 myogenic system, LR-split-seq associates isoforms to cell types with relative economy and design flexibility. We find widespread evidence of changing isoform expression during differentiation including alternative transcription start sites (TSS) and/or alternative internal exon usage. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells.

Download Full-text

scCODA is a Bayesian model for compositional single-cell data analysis

Nature Communications ◽

10.1038/s41467-021-27150-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

M. Büttner ◽

J. Ostner ◽

C. L. Müller ◽

F. J. Theis ◽

B. Schubert

Keyword(s):

Data Analysis ◽

Single Cell ◽

Bayesian Model ◽

Cell Types ◽

Biological Processes ◽

Complex Cell ◽

Cell Type ◽

Compositional Changes ◽

False Discoveries ◽

Cell Data

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.

Download Full-text

Airpart: Interpretable statistical models for analyzing allelic imbalance in single-cell datasets

10.1101/2021.10.15.464546 ◽

2021 ◽

Author(s):

Wancen Mu ◽

Hirak Sarkar ◽

Avi Srivastava ◽

Kwangbom Choi ◽

Rob Patro ◽

...

Keyword(s):

Single Cell ◽

Allelic Imbalance ◽

Genetic Regulation ◽

Real Data ◽

Cell Types ◽

Cell Type ◽

Time Resolved ◽

Bulk Data ◽

Cell Type Specific ◽

Cell Data

Motivation: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified differential AI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability: The airpart package is available as a R/Bioconductor package at https://bioconductor.org/packages/airpart.

Download Full-text

Cell type prioritization in single-cell data

10.1101/2019.12.20.884916 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael A. Skinnider ◽

Jordan W. Squair ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Matthieu Gautier ◽

...

Keyword(s):

Single Cell ◽

Neural Circuits ◽

Cell Types ◽

Chromatin Accessibility ◽

High Dimensional ◽

Machine Learning Method ◽

Learning Method ◽

Rna Seq ◽

Cell Type ◽

Cell Data

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.

Download Full-text