Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq

AbstractIn single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.

Download Full-text

Cumulus: a cloud-based data analysis framework for large-scale single-cell and single-nucleus RNA-seq

10.1101/823682 ◽

2019 ◽

Cited By ~ 8

Author(s):

Bo Li ◽

Joshua Gould ◽

Yiming Yang ◽

Siranush Sarkizova ◽

Marcin Tabaka ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Bone Marrow Cells ◽

Low Cost ◽

Data Generation ◽

Rna Seq ◽

User Friendliness ◽

Health And Disease ◽

Single Nucleus

AbstractMassively parallel single-cell and single-nucleus RNA-seq (sc/snRNA-seq) have opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so does the need for computational pipelines for scaled analysis. Here, we developed Cumulus, a cloud-based framework for analyzing large scale sc/snRNA-seq datasets. Cumulus combines the power of cloud computing with improvements in algorithm implementations to achieve high scalability, low cost, user-friendliness, and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.

Download Full-text

FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants

Genome Biology ◽

10.1186/s13059-021-02288-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Cell Walls ◽

Large Scale ◽

Full Length ◽

Cell Level ◽

Root Cells ◽

Rna Profiling ◽

Different Types ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

Download Full-text

Comparative analysis of antibody- and lipid-based multiplexing methods for single-cell RNA-seq

10.1101/2020.11.16.384222 ◽

2020 ◽

Author(s):

Viacheslav Mylka ◽

Jeroen Aerts ◽

Irina Matetovici ◽

Suresh Poovathingal ◽

Niels Vandamme ◽

...

Keyword(s):

Genetic Variation ◽

Comparative Analysis ◽

Single Cell ◽

Cell Lines ◽

Clinical Studies ◽

Clinical Samples ◽

Rna Seq ◽

Batch Effects ◽

Single Cell Sequencing ◽

Single Nucleus

ABSTRACTMultiplexing of samples in single-cell RNA-seq studies allows significant reduction of experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or - lipids allow barcoding sample-specific cells, a process called ‘hashing’. Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines. Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects.

Download Full-text

Predicting heterogeneity in clone-specific therapeutic vulnerabilities using single-cell transcriptomic signatures

Genome Medicine ◽

10.1186/s13073-021-01000-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Chayaporn Suphavilai ◽

Shumei Chia ◽

Ankur Sharma ◽

Lorna Tu ◽

Rafael Peres Da Silva ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Drug Response ◽

Drug Repurposing ◽

High Accuracy ◽

Molecular Heterogeneity ◽

Precision Oncology ◽

Scale Analysis ◽

Rna Seq ◽

Large Scale Analysis

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

CellMap: Characterizing the types and composition of iPSC-derived cells from RNA-seq data

10.1101/2021.05.24.445360 ◽

2021 ◽

Author(s):

Zhengyu Ouyang ◽

Nathanael Bourgeois ◽

Eugenia Lyashenko ◽

Paige Cundiff ◽

Patrick F Cullen ◽

...

Keyword(s):

Single Cell ◽

Induced Pluripotent Stem Cell ◽

Cell Types ◽

Model Systems ◽

Rna Seq ◽

Cell Type ◽

Fine Grained ◽

Single Nucleus ◽

Induced Pluripotent

Induced pluripotent stem cell (iPSC) derived cell types are increasingly employed as in vitro model systems for drug discovery. For these studies to be meaningful, it is important to understand the reproducibility of the iPSC-derived cultures and their similarity to equivalent endogenous cell types. Single-cell and single-nucleus RNA sequencing (RNA-seq) are useful to gain such understanding, but they are expensive and time consuming, while bulk RNA-seq data can be generated quicker and at lower cost. In silico cell type decomposition is an efficient, inexpensive, and convenient alternative that can leverage bulk RNA-seq to derive more fine-grained information about these cultures. We developed CellMap, a computational tool that derives cell type profiles from publicly available single-cell and single-nucleus datasets to infer cell types in bulk RNA-seq data from iPSC-derived cell lines.

Download Full-text

SCOTCluster: Deep Clustering with Optimal Transport for Large-scale Single-cell RNA-seq Data

10.1109/bibm52615.2021.9669800 ◽

2021 ◽

Author(s):

Faning Long ◽

Xiaojun Ding ◽

Xiaoqing Peng ◽

Jianxin Wang ◽

Xiaoshu Zhu

Keyword(s):

Single Cell ◽

Large Scale ◽

Optimal Transport ◽

Rna Seq

Download Full-text