One Cell At a Time: A Unified Framework to Integrate and Analyze Single-cell RNA-seq Data

Mapping Intimacies ◽

10.1101/2021.05.12.443814 ◽

2021 ◽

Author(s):

Chloe Xueqi Wang ◽

Lin Zhang ◽

Bo Wang

Keyword(s):

Single Cell ◽

Large Scale ◽

Gene Selection ◽

De Novo ◽

Single Cells ◽

Cell Types ◽

Biological Information ◽

Rna Seq ◽

Unified Framework ◽

Gene Expressions

The surge of single-cell RNA sequencing technologies enables the accessibility to large single-cell RNA-seq datasets at the scale of hundreds of thousands of single cells. Integrative analysis of large-scale scRNA-seq datasets has the potential of revealing de novo cell types as well as aggregating biological information. However, most existing methods fail to integrate multiple large-scale scRNA-seq datasets in a computational and memory efficient way. We hereby propose OCAT, One Cell At a Time, a graph-based method that sparsely encodes single-cell gene expressions to integrate data from multiple sources without most variable gene selection or explicit batch effect correction. We demonstrate that OCAT efficiently integrates multiple scRNA-seq datasets and achieves the state-of-the-art performance in cell-type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT facilitates a variety of downstream analyses, such as gene prioritization, trajectory inference, pseudotime inference and cell inference. OCAT is a unifying tool to simplify and expedite single-cell data analysis.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

Assessing the measurement transfer function of single-cell RNA sequencing

10.1101/045450 ◽

2016 ◽

Author(s):

Hannah R. Dueck ◽

Rizi Ai ◽

Adrian Camarena ◽

Bo Ding ◽

Reymundo Dominguez ◽

...

Keyword(s):

Transfer Function ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Transfer Functions ◽

Single Cells ◽

Biological Information ◽

Gene Detection ◽

Single Cell Rna Sequencing ◽

Scale Control

AbstractRecently, measurement of RNA at single cell resolution has yielded surprising insights. Methods for single-cell RNA sequencing (scRNA-seq) have received considerable attention, but the broad reliability of single cell methods and the factors governing their performance are still poorly known. Here, we conducted a large-scale control experiment to assess the transfer function of three scRNA-seq methods and factors modulating the function. All three methods detected greater than 70% of the expected number of genes and had a 50% probability of detecting genes with abundance greater than 2 to 4 molecules. Despite the small number of molecules, sequencing depth significantly affected gene detection. While biases in detection and quantification were qualitatively similar across methods, the degree of bias differed, consistent with differences in molecular protocol. Measurement reliability increased with expression level for all methods and we conservatively estimate the measurement transfer functions to be linear above ~5-10 molecules. Based on these extensive control studies, we propose that RNA-seq of single cells has come of age, yielding quantitative biological information.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

Computational approaches towards reducing contamination in single-cell RNA-seq data

10.1101/2020.07.15.205062 ◽

2020 ◽

Author(s):

Siamak Yousefi ◽

Hao Chen ◽

Jesse F. Ingels ◽

Melinda S. McCarty ◽

Arthur G. Centeno ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Real Life ◽

Cell Types ◽

Cell Capture ◽

Rna Seq ◽

Sequence Analyses ◽

Cell Functions ◽

Biological Interpretation ◽

Different Cell Types

SUMMARYSingle cell RNA sequencing has enabled quantification of single cells and identification of different cell types and subtypes as well as cell functions in different tissues. Single cell RNA sequence analyses assume acquired RNAs correspond to cells, however, RNAs from contamination within the input data are also captured by these assays. The sequencing of background contamination as well as unwanted cells making their way to the final assay Potentially confound the correct biological interpretation of single cell transcriptomic data. Here we demonstrate two approaches to deal with background contamination as well as profiling of unwanted cells in the assays. We use three real-life datasets of whole-cell capture and nucleotide single-cell captures generated by Fluidigm and 10x technologies and show that these methods reduce the effect of contamination, strengthen clustering of cells and improves biological interpretation.

Download Full-text

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

10.1101/2020.06.15.151910 ◽

2020 ◽

Author(s):

Van Hoan Do ◽

Francisca Rojas Ringeling ◽

Stefan Canzar

Keyword(s):

Single Cell ◽

Large Scale ◽

Linear Time ◽

Cell Types ◽

Substantial Improvement ◽

Rna Seq ◽

Sampling Step ◽

Protein Marker ◽

Cluster Ensembles ◽

Sequencing Technologies

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.

Download Full-text

Single cell profiling of total RNA using Smart-seq-total

10.1101/2020.06.02.131060 ◽

2020 ◽

Author(s):

Alina Isakova ◽

Norma Neff ◽

Stephen R. Quake

Keyword(s):

Single Cell ◽

Single Cells ◽

Embryonic Stem ◽

Cell Types ◽

Embryoid Bodies ◽

High Yield ◽

Rna Seq ◽

Mcf7 Cells ◽

Total Rna ◽

Non Coding Rna

ABSTRACTThe ability to interrogate total RNA content of single cells would enable better mapping of the transcriptional logic behind emerging cell types and states. However, current RNA-seq methods are unable to simultaneously monitor both short and long, poly(A)+ and poly(A)-transcripts at the single-cell level, and thus deliver only a partial snapshot of the cellular RNAome. Here, we describe Smart-seq-total, a method capable of assaying a broad spectrum of coding and non-coding RNA from a single cell. Built upon the template-switch mechanism, Smart-seq-total bears the key feature of its predecessor, Smart-seq2, namely, the ability to capture full-length transcripts with high yield and quality. It also outperforms current poly(A)–independent total RNA-seq protocols by capturing transcripts of a broad size range, thus, allowing us to simultaneously analyze protein-coding, long non-coding, microRNA and other non-coding RNA transcripts from single cells. We used Smart-seq-total to analyze the total RNAome of human primary fibroblasts, HEK293T and MCF7 cells as well as that of induced murine embryonic stem cells differentiated into embryoid bodies. We show that simultaneous measurement of non-coding RNA and mRNA from the same cell enables elucidation of new roles of non-coding RNA throughout essential processes such as cell cycle or lineage commitment. Moreover, we show that cell types can be distinguished based on the abundance of non-coding transcripts alone.

Download Full-text

Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2113568118 ◽

2021 ◽

Vol 118 (51) ◽

pp. e2113568118

Author(s):

Alina Isakova ◽

Norma Neff ◽

Stephen R. Quake

Keyword(s):

Single Cell ◽

Noncoding Rna ◽

Single Cells ◽

Embryonic Stem ◽

Cell Types ◽

Embryoid Bodies ◽

Rna Seq ◽

Total Rna ◽

Rna Transcripts ◽

Rna Content

The ability to interrogate total RNA content of single cells would enable better mapping of the transcriptional logic behind emerging cell types and states. However, current single-cell RNA-sequencing (RNA-seq) methods are unable to simultaneously monitor all forms of RNA transcripts at the single-cell level, and thus deliver only a partial snapshot of the cellular RNAome. Here we describe Smart-seq-total, a method capable of assaying a broad spectrum of coding and noncoding RNA from a single cell. Smart-seq-total does not require splitting the RNA content of a cell and allows the incorporation of unique molecular identifiers into short and long RNA molecules for absolute quantification. It outperforms current poly(A)-independent total RNA-seq protocols by capturing transcripts of a broad size range, thus enabling simultaneous analysis of protein-coding, long-noncoding, microRNA, and other noncoding RNA transcripts from single cells. We used Smart-seq-total to analyze the total RNAome of human primary fibroblasts, HEK293T, and MCF7 cells, as well as that of induced murine embryonic stem cells differentiated into embryoid bodies. By analyzing the coexpression patterns of both noncoding RNA and mRNA from the same cell, we were able to discover new roles of noncoding RNA throughout essential processes, such as cell cycle and lineage commitment during embryonic development. Moreover, we show that independent classes of short-noncoding RNA can be used to determine cell-type identity.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text

scTPA: A web tool for single-cell transcriptome analysis of pathway activation signatures

10.1101/2020.01.15.907592 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yan Zhang ◽

Yaru Zhang ◽

Jun Hu ◽

Ji Zhang ◽

Fangjie Guo ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Biological Pathways ◽

Rna Seq ◽

Web Tool ◽

Functional Interpretation ◽

Cell Functions ◽

Pathway Activation

ABSTRACTThe most fundamental challenge in current single-cell RNA-seq data analysis is functional interpretation and annotation of cell clusters. The biological pathways in distinct cell types have different activation patterns, which facilitates understanding cell functions in single-cell transcriptomics. However, no effective web tool has been implemented for single-cell transcriptomic data analysis based on prior biological pathway knowledge. Here, we introduce scTPA (http://sctpa.bio-data.cn/sctpa), which is a web-based platform providing pathway-based analysis of single-cell RNA-seq data in human and mouse. scTPA incorporates four widely-used gene set enrichment methods to estimate the pathway activation scores of single cells based on a collection of available biological pathways with different functional and taxonomic classifications. The clustering analysis and cell-type-specific activation pathway identification were provided for the functional interpretation of cell types from pathway-oriented perspective. An intuitive interface allows users to conveniently visualize and download single-cell pathway signatures. Together, scTPA is a comprehensive tool to identify pathway activation signatures for dissecting single cell heterogeneity.

Download Full-text

Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing

10.1101/104844 ◽

2017 ◽

Cited By ~ 8

Author(s):

Junyue Cao ◽

Jonathan S. Packer ◽

Vijay Ramani ◽

Darren A. Cusanovich ◽

Chau Huynh ◽

...

Keyword(s):

Single Cell ◽

Expression Profiles ◽

Single Cells ◽

Transcriptional Profiling ◽

Cost Effective ◽

Cell Types ◽

Cell Capture ◽

Rna Seq ◽

Major Step ◽

C Elegans

AbstractConventional methods for profiling the molecular content of biological samples fail to resolve heterogeneity that is present at the level of single cells. In the past few years, single cell RNA sequencing has emerged as a powerful strategy for overcoming this challenge. However, its adoption has been limited by a paucity of methods that are at once simple to implement and cost effective to scale massively. Here, we describe a combinatorial indexing strategy to profile the transcriptomes of large numbers of single cells or single nuclei without requiring the physical isolation of each cell (Single cell Combinatorial Indexing RNA-seq or sci-RNA-seq). We show that sci-RNA-seq can be used to efficiently profile the transcriptomes of tens-of-thousands of single cells per experiment, and demonstrate that we can stratify cell types from these data. Key advantages of sci-RNA-seq over contemporary alternatives such as droplet-based single cell RNA-seq include sublinear cost scaling, a reliance on widely available reagents and equipment, the ability to concurrently process many samples within a single workflow, compatibility with methanol fixation of cells, cell capture based on DNA content rather than cell size, and the flexibility to profile either cells or nuclei. As a demonstration of sci-RNA-seq, we profile the transcriptomes of 42,035 single cells from C. elegans at the L2 stage, effectively 50-fold “shotgun cellular coverage” of the somatic cell composition of this organism at this stage. We identify 27 distinct cell types, including rare cell types such as the two distal tip cells of the developing gonad, estimate consensus expression profiles and define cell-type specific and selective genes. Given that C. elegans is the only organism with a fully mapped cellular lineage, these data represent a rich resource for future methods aimed at defining cell types and states. They will advance our understanding of developmental biology, and constitute a major step towards a comprehensive, single-cell molecular atlas of a whole animal.

Download Full-text