Detection and removal of barcode swapping in single-cell RNA-seq data

AbstractBarcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping.

Download Full-text

Human dermal fibroblast subpopulations are conserved across single-cell RNA sequencing studies

Journal of Investigative Dermatology ◽

10.1016/j.jid.2020.11.028 ◽

2020 ◽

Author(s):

Alex M. Ascensión ◽

Sandra Fuertes-Álvarez ◽

Olga Ibañez-Solé ◽

Ander Izeta ◽

Marcos J. Araúzo-Bravo

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Dermal Fibroblast ◽

Human Dermal Fibroblast ◽

Single Cell Rna Sequencing ◽

Sequencing Studies

Download Full-text

The interplay between microglial states and major risk factors in Alzheimer’s disease through the eyes of single-cell RNA-sequencing: beyond black and white

Journal of Neurophysiology ◽

10.1152/jn.00395.2019 ◽

2019 ◽

Vol 122 (4) ◽

pp. 1291-1296 ◽

Cited By ~ 2

Author(s):

Djuna von Maydell ◽

Mehdi Jorfi

Keyword(s):

Risk Factors ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Single Cell ◽

Rna Sequencing ◽

Cellular Mechanisms ◽

Black And White ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Integral Role

Microglia constitute ~10–20% of glial cells in the adult human brain. They are the resident phagocytic immune cells of the central nervous system and play an integral role as first responders during inflammation. Microglia are commonly classified as “HM” (homeostatic), “M1” (classically activated proinflammatory), or “M2” (alternatively activated). Multiple single-cell RNA-sequencing studies suggest that this discrete classification system does not accurately and fully capture the vast heterogeneity of microglial states in the brain. In fact, a recent single-cell RNA-sequencing study showed that microglia exist along a continuous spectrum of states. This spectrum spans heterogeneous populations of homeostatic and neuropathology-associated microglia in both healthy and Alzheimer’s disease (AD) mouse brains. Major risk factors, such as sex, age, and genes, modulate microglial states, suggesting that shifts along the trajectory might play a causal role in AD pathogenesis. This study provides important insight into the cellular mechanisms of AD and underlines the potential of novel cell-based therapies for AD.

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Ultra-high throughput single-cell RNA sequencing by combinatorial fluidic indexing

10.1101/2019.12.17.879304 ◽

2019 ◽

Cited By ~ 4

Author(s):

Paul Datlinger ◽

André F Rendeiro ◽

Thorina Boenke ◽

Thomas Krausgruber ◽

Daniele Barreca ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Population Genomics ◽

Cost Effective ◽

Mouse Cell ◽

Droplet Microfluidics ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

Massive Scale ◽

Tcr Activation

AbstractCell atlas projects and single-cell CRISPR screens hit the limits of current technology, as they require cost-effective profiling for millions of individual cells. To satisfy these enormous throughput requirements, we developed “single-cell combinatorial fluidic indexing” (scifi) and applied it to single-cell RNA sequencing. The resulting scifi-RNA-seq assay combines one-step combinatorial pre-indexing of single-cell transcriptomes with subsequent single-cell RNA-seq using widely available droplet microfluidics. Pre-indexing allows us to load multiple cells per droplet, which increases the throughput of droplet-based single-cell RNA-seq up to 15-fold, and it provides a straightforward way of multiplexing hundreds of samples in a single scifi-RNA-seq experiment. Compared to multi-round combinatorial indexing, scifi-RNA-seq provides an easier, faster, and more efficient workflow, thereby enabling massive-scale scRNA-seq experiments for a broad range of applications ranging from population genomics to drug screens with scRNA-seq readout. We benchmarked scifi-RNA-seq on various human and mouse cell lines, and we demonstrated its feasibility for human primary material by profiling TCR activation in T cells.

Download Full-text

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

Genes ◽

10.3390/genes12121947 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1947

Author(s):

Samarendra Das ◽

Anil Rai ◽

Michael L. Merchant ◽

Matthew C. Cave ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Individual Performance ◽

Rna Seq ◽

Gene Expressions ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Download Full-text

Controlling for confounding effects in single cell RNA sequencing studies using both control and target genes

10.1101/045070 ◽

2016 ◽

Author(s):

Mengjie Chen ◽

Xiang Zhou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Target Genes ◽

Expectation Maximization Algorithm ◽

Data Sets ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Order Of Magnitude ◽

The Rich ◽

Downstream Analysis

Single cell RNA sequencing (scRNAseq) technique is becoming increasingly popular for unbiased and high-resolutional transcriptome analysis of heterogeneous cell populations. Despite its many advantages, scRNAseq, like any other genomic sequencing technique, is susceptible to the influence of confounding effects. Controlling for confounding effects in scRNAseq data is thus a crucial step for proper data normalization and accurate downstream analysis. Several recent methodological studies have demonstrated the use of control genes for controlling for confounding effects in scRNAseq studies; the control genes are used to infer the confounding effects, which are then used to normalize target genes of primary interest. However, these methods can be suboptimal as they ignore the rich information contained in the target genes. Here, we develop an alternative statistical method, which we refer to as scPLS, for more accurate inference of confounding effects. Our method is based on partial least squares and models control and target genes jointly to better infer and control for confounding effects. To accompany our method, we develop a novel expectation maximization algorithm for scalable inference. Our algorithm is an order of magnitude faster than standard ones, making scPLS applicable to hundreds of cells and hundreds of thousands of genes. With extensive simulations and comparisons with other methods, we demonstrate the effectiveness of scPLS. Finally, we apply scPLS to analyze two scRNAseq data sets to illustrate its benefits in removing technical confounding effects as well as for removing cell cycle effects.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Myeloid heterogeneity in kidney disease as revealed through single cell RNA sequencing

Kidney360 ◽

10.34067/kid.0003682021 ◽

2021 ◽

pp. 10.34067/KID.0003682021

Author(s):

Rachel M B Bell ◽

Laura Denby

Keyword(s):

Kidney Disease ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Myeloid Cells ◽

Rna Seq ◽

Cellular Compartment ◽

Single Cell Rna Sequencing ◽

Health And Disease ◽

Diseased Kidney

Kidney disease represents a global health burden of increasing prevalence and is an independent risk factor for cardiovascular disease. Myeloid cells are a major cellular compartment of the immune system; they are found in the healthy kidney and in increased numbers in the damaged and/or diseased kidney, where they act as key players in the progression of injury, inflammation and fibrosis. They possess enormous plasticity and heterogeneity, adopting different phenotypic and functional characteristics in response to stimuli in the local milieu. Though this inherent complexity remains to be fully understood in the kidney, advances in single-cell genomics promises to change this. Specifically, single-cell RNA sequencing (scRNA-seq) has had a transformative effect on kidney research, enabling the profiling and analysis of the transcriptomes of single cells at unprecedented resolution and throughput, and subsequent generation of cell atlases. Moving forward, combining scRNA- and single-nuclear RNA-seq with greater resolution spatial transcriptomics will allow spatial mapping of kidney disease of varying aetiology to further reveal the patterning of immune cells and non-immune renal cells. This review summarises the roles of myeloid cells in kidney health and disease, the experimental workflow in currently available scRNA-seq technologies and published findings using scRNA-seq in the context of myeloid cells and the kidney.

Download Full-text

DOP23 Single-cell RNA sequencing identifies an important role for class I histone-deacetylase enzymes in intestinal myofibroblasts from patients with Crohn’s Disease strictures

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab073.062 ◽

2021 ◽

Vol 15 (Supplement_1) ◽

pp. S062-S062

Author(s):

A Lewis ◽

B Pan-Castillo ◽

G Berti ◽

C Felice ◽

H Gordon ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Line ◽

Rna Sequencing ◽

Histone Deacetylase ◽

Chromatin Accessibility ◽

Collagen I ◽

Class I ◽

Rna Seq ◽

Single Cell Rna Sequencing

Abstract Background Histone-deacetylase (HDAC) enzymes are a broad class of ubiquitously expressed enzymes that modulate histone acetylation, chromatin accessibility and gene expression. In models of Inflammatory bowel disease (IBD), HDAC inhibitors, such as Valproic acid (VPA) are proven anti-inflammatory agents and evidence suggests that they also inhibit fibrosis in non-intestinal organs. However, the role of HDAC enzymes in stricturing Crohn’s disease (CD) has not been characterised; this is key to understanding the molecular mechanism and developing novel therapies. Methods To evaluate HDAC expression in the intestine of SCD patients, we performed unbiased single-cell RNA sequencing (sc-RNA-seq) of over 10,000 cells isolated from full-thickness surgical resection specimens of non-SCD (NSCD; n=2) and SCD intestine (n=3). Approximately, 1000 fibroblasts were identified for further analysis, including a distinct cluster of myofibroblasts. Changes in gene expression were compared between myofibroblasts and other resident intestinal fibroblasts using the sc-RNA-seq analysis pipeline in Partek. Changes in HDAC expression and markers of HDAC activity (H3K27ac) were confirmed by immunohistochemistry in FFPE tissue from patient matched NSCD and SCD intestine (n=14 pairs). The function of HDACs in intestinal fibroblasts in the CCD-18co cell line and primary CD myofibroblast cultures (n=16 cultures) was assessed using VPA, a class I HDAC inhibitor. Cells were analysed using a variety of molecular techniques including ATAC-seq, gene expression arrays, qPCR, western blot and immunofluorescent protein analysis. Results Class I HDAC (HDAC1, p= 2.11E-11; HDAC2, p= 4.28E-11; HDAC3, p= 1.60E-07; and HDAC8, p= 2.67E-03) expression was increased in myofibroblasts compared to other intestinal fibroblasts subtypes. IHC also showed an increase in the percentage of stromal HDAC2 positive cells, coupled with a decrease in the percentage of H3K27ac positive cells, in the mucosa overlying SCD intestine relative to matched NSCD areas. In the CCD-18co cell line and primary myofibroblast cultures, VPA reduced chromatin accessibility at Collagen-I gene promoters and suppressed their transcription. VPA also inhibited TGFB-induced up-regulation of Collagen-I, in part by inhibiting TGFB1|1/SMAD4 signalling. TGFB1|1 was identified as a mesenchymal specific target of VPA and siRNA knockdown of TGFB1|1 was sufficient suppress TGFB-induced up-regulation of Collagen-I. Conclusion In SCD patients, class I HDAC expression is increased in myofibroblasts. Class I HDACs inhibitors impair TGFB-signalling and inhibit Collagen-I expression. Selective targeting of TGFB1|1 offers the opportunity to increase treatment specificity by selectively targeting meschenymal cells.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text