Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies

Microglia constitute ~10–20% of glial cells in the adult human brain. They are the resident phagocytic immune cells of the central nervous system and play an integral role as first responders during inflammation. Microglia are commonly classified as “HM” (homeostatic), “M1” (classically activated proinflammatory), or “M2” (alternatively activated). Multiple single-cell RNA-sequencing studies suggest that this discrete classification system does not accurately and fully capture the vast heterogeneity of microglial states in the brain. In fact, a recent single-cell RNA-sequencing study showed that microglia exist along a continuous spectrum of states. This spectrum spans heterogeneous populations of homeostatic and neuropathology-associated microglia in both healthy and Alzheimer’s disease (AD) mouse brains. Major risk factors, such as sex, age, and genes, modulate microglial states, suggesting that shifts along the trajectory might play a causal role in AD pathogenesis. This study provides important insight into the cellular mechanisms of AD and underlines the potential of novel cell-based therapies for AD.

Download Full-text

Experimental design for single-cell RNA sequencing

Briefings in Functional Genomics ◽

10.1093/bfgp/elx035 ◽

2017 ◽

Vol 17 (4) ◽

pp. 233-239 ◽

Cited By ~ 39

Author(s):

Jeanette Baran-Gale ◽

Tamir Chandra ◽

Kristina Kirschner

Keyword(s):

Experimental Design ◽

Single Cell ◽

Rna Sequencing ◽

Single Cell Rna Sequencing

Download Full-text

Controlling for confounding effects in single cell RNA sequencing studies using both control and target genes

10.1101/045070 ◽

2016 ◽

Author(s):

Mengjie Chen ◽

Xiang Zhou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Target Genes ◽

Expectation Maximization Algorithm ◽

Data Sets ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Order Of Magnitude ◽

The Rich ◽

Downstream Analysis

Single cell RNA sequencing (scRNAseq) technique is becoming increasingly popular for unbiased and high-resolutional transcriptome analysis of heterogeneous cell populations. Despite its many advantages, scRNAseq, like any other genomic sequencing technique, is susceptible to the influence of confounding effects. Controlling for confounding effects in scRNAseq data is thus a crucial step for proper data normalization and accurate downstream analysis. Several recent methodological studies have demonstrated the use of control genes for controlling for confounding effects in scRNAseq studies; the control genes are used to infer the confounding effects, which are then used to normalize target genes of primary interest. However, these methods can be suboptimal as they ignore the rich information contained in the target genes. Here, we develop an alternative statistical method, which we refer to as scPLS, for more accurate inference of confounding effects. Our method is based on partial least squares and models control and target genes jointly to better infer and control for confounding effects. To accompany our method, we develop a novel expectation maximization algorithm for scalable inference. Our algorithm is an order of magnitude faster than standard ones, making scPLS applicable to hundreds of cells and hundreds of thousands of genes. With extensive simulations and comparisons with other methods, we demonstrate the effectiveness of scPLS. Finally, we apply scPLS to analyze two scRNAseq data sets to illustrate its benefits in removing technical confounding effects as well as for removing cell cycle effects.

Download Full-text

Detection and removal of barcode swapping in single-cell RNA-seq data

10.1101/177048 ◽

2017 ◽

Cited By ~ 5

Author(s):

Jonathan A. Griffiths ◽

Arianne C. Richard ◽

Karsten Bach ◽

Aaron T.L. Lun ◽

John C Marioni

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Flow Cell ◽

Rna Seq ◽

Genomic Assays ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Continued Use ◽

Statistical Approaches ◽

Transcriptomic Studies

AbstractBarcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping.

Download Full-text

Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design

GigaScience ◽

10.1093/gigascience/giab062 ◽

2021 ◽

Vol 10 (9) ◽

Cited By ~ 1

Author(s):

Lukas M Weber ◽

Ariel A Hippen ◽

Peter F Hickey ◽

Kristofer C Berrett ◽

Jason Gertz ◽

...

Keyword(s):

Genetic Variation ◽

Experimental Design ◽

Single Cell ◽

Rna Sequencing ◽

In Silico ◽

Cost Savings ◽

Cancer Tissue ◽

Library Preparation ◽

Natural Genetic Variation ◽

Single Cell Rna Sequencing

Abstract Background Pooling cells from multiple biological samples prior to library preparation within the same single-cell RNA sequencing experiment provides several advantages, including lower library preparation costs and reduced unwanted technological variation, such as batch effects. Computational demultiplexing tools based on natural genetic variation between individuals provide a simple approach to demultiplex samples, which does not require complex additional experimental procedures. However, to our knowledge these tools have not been evaluated in cancer, where somatic variants, which could differ between cells from the same sample, may obscure the signal in natural genetic variation. Results Here, we performed in silico benchmark evaluations by combining raw sequencing reads from multiple single-cell samples in high-grade serous ovarian cancer, which has a high copy number burden, and lung adenocarcinoma, which has a high tumor mutational burden. Our results confirm that genetic demultiplexing tools can be effectively deployed on cancer tissue using a pooled experimental design, although high proportions of ambient RNA from cell debris reduce performance. Conclusions This strategy provides significant cost savings through pooled library preparation. To facilitate similar analyses at the experimental design phase, we provide freely accessible code and a reproducible Snakemake workflow built around the best-performing tools found in our in silico benchmark evaluations, available at https://github.com/lmweber/snp-dmx-cancer.

Download Full-text

SNV identification from single-cell RNA sequencing data

Human Molecular Genetics ◽

10.1093/hmg/ddz207 ◽

2019 ◽

Vol 28 (21) ◽

pp. 3569-3583 ◽

Cited By ~ 3

Author(s):

Patricia M Schnepp ◽

Mengjie Chen ◽

Evan T Keller ◽

Xiang Zhou

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Specific Gene ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Genomic Regions

Abstract Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.

Download Full-text