scholarly journals PMD Uncovers Widespread Cell-State Erasure by scRNAseq Batch Correction Methods

2021 ◽  
Author(s):  
Scott R Tyler ◽  
Supinda Bunyavanich ◽  
Eric E Schadt

Single cell RNAseq (scRNAseq) batches range from technical replicates to multi-tissue atlases, thus requiring robust batch correction methods that operate effectively across this similarity spectrum. Currently, no metrics allow for full benchmarking across this spectrum, resulting in benchmarks that quantify removal of batch effects without quantifying preservation of real batch differences. Here, we address these gaps with a new statistical metric [Percent Maximum Difference (PMD)] that linearly quantifies batch similarity, and simulations generating cells from mixtures of distinct gene expression programs (cell-lineages/-types/-states). Using 690 real-world and 672 simulated integrations (7.2e6 cells total) we compared 7 batch integration approaches across the spectrum of similarity with batch-confounded gene expression. Count downsampling appeared the most robust, while others left residual batch effects or produced over-merged datasets. We further released open-source PMD and downsampling packages, with the latter capable of downsampling an organism atlas (245,389 cells) in tens of minutes on a standard computer.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mohammad M. Karimi ◽  
Ya Guo ◽  
Xiaokai Cui ◽  
Husayn A. Pallikonda ◽  
Veronika Horková ◽  
...  

AbstractCD4 and CD8 mark helper and cytotoxic T cell lineages, respectively, and serve as coreceptors for MHC-restricted TCR recognition. How coreceptor expression is matched with TCR specificity is central to understanding CD4/CD8 lineage choice, but visualising coreceptor gene activity in individual selection intermediates has been technically challenging. It therefore remains unclear whether the sequence of coreceptor gene expression in selection intermediates follows a stereotypic pattern, or is responsive to signaling. Here we use single cell RNA sequencing (scRNA-seq) to classify mouse thymocyte selection intermediates by coreceptor gene expression. In the unperturbed thymus, Cd4+Cd8a- selection intermediates appear before Cd4-Cd8a+ selection intermediates, but the timing of these subsets is flexible according to the strength of TCR signals. Our data show that selection intermediates discriminate MHC class prior to the loss of coreceptor expression and suggest a model where signal strength informs the timing of coreceptor gene activity and ultimately CD4/CD8 lineage choice.


2020 ◽  
Author(s):  
Ruben Chazarra-Gil ◽  
Stijn van Dongen ◽  
Vladimir Yu Kiselev ◽  
Martin Hemberg

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.


2021 ◽  
Vol 12 ◽  
Author(s):  
Bin Zou ◽  
Tongda Zhang ◽  
Ruilong Zhou ◽  
Xiaosen Jiang ◽  
Huanming Yang ◽  
...  

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.


2018 ◽  
Author(s):  
Ken Jean-Baptiste ◽  
José L. McFaline-Figueroa ◽  
Cristina M. Alexandre ◽  
Michael W. Dorrity ◽  
Lauren Saunders ◽  
...  

ABSTRACTSingle-cell RNA-seq can yield high-resolution cell-type-specific expression signatures that reveal new cell types and the developmental trajectories of cell lineages. Here, we apply this approach toA. thalianaroot cells to capture gene expression in 3,121 root cells. We analyze these data with Monocle 3, which orders single cell transcriptomes in an unsupervised manner and uses machine learning to reconstruct single-cell developmental trajectories along pseudotime. We identify hundreds of genes with cell-type-specific expression, with pseudotime analysis of several cell lineages revealing both known and novel genes that are expressed along a developmental trajectory. We identify transcription factor motifs that are enriched in early and late cells, together with the corresponding candidate transcription factors that likely drive the observed expression patterns. We assess and interpret changes in total RNA expression along developmental trajectories and show that trajectory branch points mark developmental decisions. Finally, by applying heat stress to whole seedlings, we address the longstanding question of possible heterogeneity among cell types in the response to an abiotic stress. Although the response of canonical heat shock genes dominates expression across cell types, subtle but significant differences in other genes can be detected among cell types. Taken together, our results demonstrate that single-cell transcriptomics holds promise for studying plant development and plant physiology with unprecedented resolution.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Po-Yuan Tung ◽  
John D. Blischak ◽  
Chiaowen Joyce Hsiao ◽  
David A. Knowles ◽  
Jonathan E. Burnett ◽  
...  

2020 ◽  
Author(s):  
Wanqiu Chen ◽  
Yongmei Zhao ◽  
Xin Chen ◽  
Xiaojiang Xu ◽  
Zhaowei Yang ◽  
...  

AbstractSingle-cell RNA sequencing (scRNA-seq) has become a very powerful technology for biomedical research and is becoming much more affordable as methods continue to evolve, but it is unknown how reproducible different platforms are using different bioinformatics pipelines, particularly the recently developed scRNA-seq batch correction algorithms. We carried out a comprehensive multi-center cross-platform comparison on different scRNA-seq platforms using standard reference samples. We compared six pre-processing pipelines, seven bioinformatics normalization procedures, and seven batch effect correction methods including CCA, MNN, Scanorama, BBKNN, Harmony, limma and ComBat to evaluate the performance and reproducibility of 20 scRNA-seq data sets derived from four different platforms and centers. We benchmarked scRNA-seq performance across different platforms and testing sites using global gene expression profiles as well as some cell-type specific marker genes. We showed that there were large batch effects; and the reproducibility of scRNA-seq across platforms was dictated both by the expression level of genes selected and the batch correction methods used. We found that CCA, MNN, and BBKNN all corrected the batch variations fairly well for the scRNA-seq data derived from biologically similar samples across platforms/sites. However, for the scRNA-seq data derived from or consisting of biologically distinct samples, limma and ComBat failed to correct batch effects, whereas CCA over-corrected the batch effect and misclassified the cell types and samples. In contrast, MNN, Harmony and BBKNN separated biologically different samples/cell types into correspondingly distinct dimensional subspaces; however, consistent with this algorithm’s logic, MNN required that the samples evaluated each contain a shared portion of highly similar cells. In summary, we found a great cross-platform consistency in separating two distinct samples when an appropriate batch correction method was used. We hope this large cross-platform/site scRNA-seq data set will provide a valuable resource, and that our findings will offer useful advice for the single-cell sequencing community.


2019 ◽  
Author(s):  
Yiliang Zhang ◽  
Kexuan Liang ◽  
Molei Liu ◽  
Yue Li ◽  
Hao Ge ◽  
...  

AbstractSingle-cell RNA sequencing technologies are widely used in recent years as a powerful tool allowing the observation of gene expression at the resolution of single cells. Two of the major challenges in scRNA-seq data analysis are dropout events and batch effects. The inflation of zero(dropout rate) varies substantially across single cells. Evidence has shown that technical noise, including batch effects, explains a notable proportion of this cell-to-cell variation. To capture biological variation, it is necessary to quantify and remove technical variation. Here, we introduce SCRIBE (Single-Cell Recovery Imputation with Batch Effects), a principled framework that imputes dropout events and corrects batch effects simultaneously. We demonstrate, through real examples, that SCRIBE outperforms existing scRNA-seq data analysis tools in recovering cell-specific gene expression patterns, removing batch effects and retaining biological variation across cells. Our software is freely available online at https://github.com/YiliangTracyZhang/SCRIBE.


Author(s):  
Xiaojun Yuan ◽  
Janith A. Seneviratne ◽  
Shibei Du ◽  
Ying Xu ◽  
Yijun Chen ◽  
...  

AbstractPeripheral neuroblastic tumors (PNTs) are the most common extracranial solid tumors in early childhood. They represent a spectrum of neural crest derived tumors including neuroblastoma, ganglioneuroblastoma and ganglioneuroma. PNTs exhibit heterogeneity due to interconverting malignant cell states described as adrenergic/nor-adrenergic or mesenchymal/neural crest cell in origin. The factors determining individual patient levels of tumor heterogeneity, their impact on the malignant phenotype, and the presence of other cell states are unknown. Here, single-cell RNA-sequencing analysis of 4267 cells from 7 PNTs demonstrated extensive transcriptomic heterogeneity. Trajectory modelling showed that malignant neuroblasts move between adrenergic and mesenchymal cell states via a novel state that we termed a “transitional” phenotype. Transitional cells are characterized by gene expression programs linked to a sympathoadrenal development, and aggressive tumor phenotypes such as rapid proliferation and tumor dissemination. Among primary bulk tumor patient cohorts, high expression of the transitional gene signature was highly predictive of poor prognosis when compared to adrenergic and mesenchymal expression patterns. High transitional gene expression in neuroblastoma cell lines identified a similar transitional H3K27-acetylation super-enhancer landscape, supporting the concept that PNTs have phenotypic plasticity and transdifferentiation capacity. Additionally, examination of PNT microenvironments, found that neuroblastomas contained low immune cell infiltration, high levels of non-inflammatory macrophages, and low cytotoxic T lymphocyte levels compared with more benign PNT subtypes. Modeling of cell-cell signaling in the tumor microenvironment predicted specific paracrine effects toward the various subtypes of malignant cells, suggesting further cell-extrinsic influences on malignant cell phenotype. Collectively, our study reveals the presence of a previously unrecognized transitional cell state with high malignant potential and an immune cell architecture which serve both as potential biomarkers and therapeutic targets.


Sign in / Sign up

Export Citation Format

Share Document