Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape

Mapping Intimacies ◽

10.1101/2021.08.13.456196 ◽

2021 ◽

Author(s):

Luke Zappia ◽

Fabian J Theis

Keyword(s):

Single Cell ◽

Open Science ◽

Software Tools ◽

Field Analysis ◽

Analysis Tool ◽

Tracking Data ◽

Rna Seq ◽

Science Practices ◽

Single Cell Rna Sequencing ◽

Multiple Samples

Recent years have seen a revolution in single-cell technologies, particularly single-cell RNA-sequencing (scRNA-seq). As the number, size and complexity of scRNA-seq datasets continue to increase, so does the number of computational methods and software tools for extracting meaning from them. Since 2016 the scRNA-tools database has catalogued software tools for analysing scRNA-seq data. With the number of tools in the database passing 1000, we take this opportunity to provide an update on the state of the project and the field. Analysis of five years of analysis tool tracking data clearly shows the evolution of the field, and that the focus of developers has moved from ordering cells on continuous trajectories to integrating multiple samples and making use of reference datasets. We also find evidence that open science practices reward developers with increased recognition and help accelerate the field.

Download Full-text

Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape

Genome Biology ◽

10.1186/s13059-021-02519-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Luke Zappia ◽

Fabian J. Theis

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Open Science ◽

Software Tools ◽

The State ◽

Rna Seq ◽

Science Practices ◽

Analysis Methods ◽

Single Cell Rna Sequencing ◽

Multiple Samples

AbstractRecent years have seen a revolution in single-cell RNA-sequencing (scRNA-seq) technologies, datasets, and analysis methods. Since 2016, the scRNA-tools database has cataloged software tools for analyzing scRNA-seq data. With the number of tools in the database passing 1000, we provide an update on the state of the project and the field. This data shows the evolution of the field and a change of focus from ordering cells on continuous trajectories to integrating multiple samples and making use of reference datasets. We also find that open science practices reward developers with increased recognition and help accelerate the field.

Download Full-text

scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1820006116 ◽

2019 ◽

Vol 116 (20) ◽

pp. 9775-9784 ◽

Cited By ~ 38

Author(s):

Yingxin Lin ◽

Shila Ghazanfar ◽

Kevin Y. X. Wang ◽

Johann A. Gagnon-Bartsch ◽

Kitty K. Lo ◽

...

Keyword(s):

Factor Analysis ◽

Data Integration ◽

Single Cell ◽

Rna Seq ◽

Cell Type ◽

Large Collection ◽

Single Cell Rna Sequencing ◽

Development Trajectory ◽

Biological Discovery ◽

Public Datasets

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Transcriptional network analysis of transcriptomic diversity in resident tissue macrophages and dendritic cells in the mouse mononuclear phagocyte system

10.1101/2020.03.24.002816 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kim M. Summers ◽

Stephen J. Bush ◽

David A. Hume

Keyword(s):

Dendritic Cells ◽

Network Analysis ◽

Single Cell ◽

Cell Types ◽

Mononuclear Phagocyte ◽

The Body ◽

Mononuclear Phagocyte System ◽

Primary Data ◽

Analysis Tool ◽

Rna Seq

AbstractThe mononuclear phagocyte system (MPS) is a family of cells including progenitors, circulating blood monocytes, resident tissue macrophages and dendritic cells (DC) present in every tissue in the body. To test the relationships between markers and transcriptomic diversity in the MPS, we collected from NCBI-GEO >500 quality RNA-seq datasets generated from mouse MPS cells isolated from multiple tissues. The primary data were randomly down-sized to a depth of 10 million reads and requantified. The resulting dataset was clustered using the network analysis tool Graphia. A sample-to-sample matrix revealed that MPS populations could be separated based upon tissue of origin. Cells identified as classical DC subsets, cDC1 and cDC2, and lacking Fcgr1 (CD64), were centrally-located within the MPS cluster and no more distinct than other MPS cell types. A gene-to-gene correlation matrix identified large generic co-expression clusters associated with MPS maturation and innate immune function. Smaller co-expression gene clusters including the transcription factors that drive them showed higher expression within defined isolated cells, including macrophages and DC from specific tissues. They include a cluster containing Lyve1 that implies a function in endothelial cell homeostasis, a cluster of transcripts enriched in intestinal macrophages and a generic cDC cluster associated with Ccr7. However, transcripts encoding many other putative MPS subset markers including Adgre1, Itgax, Itgam, Clec9a, Cd163, Mertk, Retnla and H2-a/e (class II MHC) clustered idiosyncratically and were not correlated with underlying functions. The data provide no support for the concept of markers of M2 polarization or the specific adaptation of DC to present antigen to T cells. Co-expression of immediate early genes (e.g. Egr1, Fos, Dusp1) and inflammatory cytokines and chemokines (Tnf, Il1b, Ccl3/4) indicated that all tissue disaggregation protocols activate MPS cells. Tissue-specific expression clusters indicated that all cell isolation procedures also co-purify other unrelated cell types that may interact with MPS cells in vivo. Comparative analysis of public RNA-seq and single cell RNA-seq data from the same lung cell populations showed that the extensive heterogeneity implied by the global cluster analysis may be even greater at a single cell level with few markers strongly correlated with each other. This analysis highlights the power of large datasets to identify the diversity of MPS cellular phenotypes, and the limited predictive value of surface markers to define lineages, functions or subpopulations.

Download Full-text

Ultra-high throughput single-cell RNA sequencing by combinatorial fluidic indexing

10.1101/2019.12.17.879304 ◽

2019 ◽

Cited By ~ 4

Author(s):

Paul Datlinger ◽

André F Rendeiro ◽

Thorina Boenke ◽

Thomas Krausgruber ◽

Daniele Barreca ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Population Genomics ◽

Cost Effective ◽

Mouse Cell ◽

Droplet Microfluidics ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

Massive Scale ◽

Tcr Activation

AbstractCell atlas projects and single-cell CRISPR screens hit the limits of current technology, as they require cost-effective profiling for millions of individual cells. To satisfy these enormous throughput requirements, we developed “single-cell combinatorial fluidic indexing” (scifi) and applied it to single-cell RNA sequencing. The resulting scifi-RNA-seq assay combines one-step combinatorial pre-indexing of single-cell transcriptomes with subsequent single-cell RNA-seq using widely available droplet microfluidics. Pre-indexing allows us to load multiple cells per droplet, which increases the throughput of droplet-based single-cell RNA-seq up to 15-fold, and it provides a straightforward way of multiplexing hundreds of samples in a single scifi-RNA-seq experiment. Compared to multi-round combinatorial indexing, scifi-RNA-seq provides an easier, faster, and more efficient workflow, thereby enabling massive-scale scRNA-seq experiments for a broad range of applications ranging from population genomics to drug screens with scRNA-seq readout. We benchmarked scifi-RNA-seq on various human and mouse cell lines, and we demonstrated its feasibility for human primary material by profiling TCR activation in T cells.

Download Full-text

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

Genes ◽

10.3390/genes12121947 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1947

Author(s):

Samarendra Das ◽

Anil Rai ◽

Michael L. Merchant ◽

Matthew C. Cave ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Individual Performance ◽

Rna Seq ◽

Gene Expressions ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Myeloid heterogeneity in kidney disease as revealed through single cell RNA sequencing

Kidney360 ◽

10.34067/kid.0003682021 ◽

2021 ◽

pp. 10.34067/KID.0003682021

Author(s):

Rachel M B Bell ◽

Laura Denby

Keyword(s):

Kidney Disease ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Myeloid Cells ◽

Rna Seq ◽

Cellular Compartment ◽

Single Cell Rna Sequencing ◽

Health And Disease ◽

Diseased Kidney

Kidney disease represents a global health burden of increasing prevalence and is an independent risk factor for cardiovascular disease. Myeloid cells are a major cellular compartment of the immune system; they are found in the healthy kidney and in increased numbers in the damaged and/or diseased kidney, where they act as key players in the progression of injury, inflammation and fibrosis. They possess enormous plasticity and heterogeneity, adopting different phenotypic and functional characteristics in response to stimuli in the local milieu. Though this inherent complexity remains to be fully understood in the kidney, advances in single-cell genomics promises to change this. Specifically, single-cell RNA sequencing (scRNA-seq) has had a transformative effect on kidney research, enabling the profiling and analysis of the transcriptomes of single cells at unprecedented resolution and throughput, and subsequent generation of cell atlases. Moving forward, combining scRNA- and single-nuclear RNA-seq with greater resolution spatial transcriptomics will allow spatial mapping of kidney disease of varying aetiology to further reveal the patterning of immune cells and non-immune renal cells. This review summarises the roles of myeloid cells in kidney health and disease, the experimental workflow in currently available scRNA-seq technologies and published findings using scRNA-seq in the context of myeloid cells and the kidney.

Download Full-text

DOP23 Single-cell RNA sequencing identifies an important role for class I histone-deacetylase enzymes in intestinal myofibroblasts from patients with Crohn’s Disease strictures

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab073.062 ◽

2021 ◽

Vol 15 (Supplement_1) ◽

pp. S062-S062

Author(s):

A Lewis ◽

B Pan-Castillo ◽

G Berti ◽

C Felice ◽

H Gordon ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Line ◽

Rna Sequencing ◽

Histone Deacetylase ◽

Chromatin Accessibility ◽

Collagen I ◽

Class I ◽

Rna Seq ◽

Single Cell Rna Sequencing

Abstract Background Histone-deacetylase (HDAC) enzymes are a broad class of ubiquitously expressed enzymes that modulate histone acetylation, chromatin accessibility and gene expression. In models of Inflammatory bowel disease (IBD), HDAC inhibitors, such as Valproic acid (VPA) are proven anti-inflammatory agents and evidence suggests that they also inhibit fibrosis in non-intestinal organs. However, the role of HDAC enzymes in stricturing Crohn’s disease (CD) has not been characterised; this is key to understanding the molecular mechanism and developing novel therapies. Methods To evaluate HDAC expression in the intestine of SCD patients, we performed unbiased single-cell RNA sequencing (sc-RNA-seq) of over 10,000 cells isolated from full-thickness surgical resection specimens of non-SCD (NSCD; n=2) and SCD intestine (n=3). Approximately, 1000 fibroblasts were identified for further analysis, including a distinct cluster of myofibroblasts. Changes in gene expression were compared between myofibroblasts and other resident intestinal fibroblasts using the sc-RNA-seq analysis pipeline in Partek. Changes in HDAC expression and markers of HDAC activity (H3K27ac) were confirmed by immunohistochemistry in FFPE tissue from patient matched NSCD and SCD intestine (n=14 pairs). The function of HDACs in intestinal fibroblasts in the CCD-18co cell line and primary CD myofibroblast cultures (n=16 cultures) was assessed using VPA, a class I HDAC inhibitor. Cells were analysed using a variety of molecular techniques including ATAC-seq, gene expression arrays, qPCR, western blot and immunofluorescent protein analysis. Results Class I HDAC (HDAC1, p= 2.11E-11; HDAC2, p= 4.28E-11; HDAC3, p= 1.60E-07; and HDAC8, p= 2.67E-03) expression was increased in myofibroblasts compared to other intestinal fibroblasts subtypes. IHC also showed an increase in the percentage of stromal HDAC2 positive cells, coupled with a decrease in the percentage of H3K27ac positive cells, in the mucosa overlying SCD intestine relative to matched NSCD areas. In the CCD-18co cell line and primary myofibroblast cultures, VPA reduced chromatin accessibility at Collagen-I gene promoters and suppressed their transcription. VPA also inhibited TGFB-induced up-regulation of Collagen-I, in part by inhibiting TGFB1|1/SMAD4 signalling. TGFB1|1 was identified as a mesenchymal specific target of VPA and siRNA knockdown of TGFB1|1 was sufficient suppress TGFB-induced up-regulation of Collagen-I. Conclusion In SCD patients, class I HDAC expression is increased in myofibroblasts. Class I HDACs inhibitors impair TGFB-signalling and inhibit Collagen-I expression. Selective targeting of TGFB1|1 offers the opportunity to increase treatment specificity by selectively targeting meschenymal cells.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text