iSMNN: batch effect correction for single-cell RNA-seq data via iterative supervised mutual nearest neighbor refinement

Abstract Batch effect correction is an essential step in the integrative analysis of multiple single-cell RNA-sequencing (scRNA-seq) data. One state-of-the-art strategy for batch effect correction is via unsupervised or supervised detection of mutual nearest neighbors (MNNs). However, both types of methods only detect MNNs across batches of uncorrected data, where the large batch effects may affect the MNN search. To address this issue, we presented a batch effect correction approach via iterative supervised MNN (iSMNN) refinement across data after correction. Our benchmarking on both simulation and real datasets showed the advantages of the iterative refinement of MNNs on the performance of correction. Compared to popular alternative methods, our iSMNN is able to better mix the cells of the same cell type across batches. In addition, iSMNN can also facilitate the identification of differentially expressed genes (DEGs) that are relevant to the biological function of certain cell types. These results indicated that iSMNN will be a valuable method for integrating multiple scRNA-seq datasets that can facilitate biological and medical studies at single-cell level.

Download Full-text

iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement

10.1101/2020.11.09.375659 ◽

2020 ◽

Author(s):

Yuchen Yang ◽

Gang Li ◽

Yifang Xie ◽

Li Wang ◽

Yingxi Yang ◽

...

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Biological Function ◽

State Of The Art ◽

Cell Types ◽

Batch Effect ◽

Iterative Refinement ◽

Rna Seq ◽

Medical Studies ◽

Cell Level

ABSTRACTBatch effect correction is an essential step in the integrative analysis of multiple single cell RNA-seq (scRNA-seq) data. One state-of-the-art strategy for batch effect correction is via unsupervised or supervised detection of mutual nearest neighbors (MNNs). However, both two kinds of methods only detect MNNs across batches on the top of uncorrected data, where the large batch effect may affect the MNN search. To address this issue, we presented iSMNN, a batch effect correction approach via iterative supervised MNN refinement across data after correction. Our benchmarking on both simulation and real datasets showed the advantages of the iterative refinement of MNNs on the performance of correction. Compared to the popular methods MNNcorrect and Seurat v3, our iSMNN is able to better mix the cells of the same cell type across batches. In addition, iSMNN can also facilitate the identification of DEGs relevant to the biological function of certain cell types. These results indicated that iSMNN will be a valuable method for integrating multiple scRNA-seq datasets that can facilitate biological and medical studies at single-cell level.

Download Full-text

SMNN: batch effect correction for single-cell RNA-seq data via supervised mutual nearest neighbor detection

Briefings in Bioinformatics ◽

10.1093/bib/bbaa097 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yuchen Yang ◽

Gang Li ◽

Huijun Qian ◽

Kirk C Wilhelmsen ◽

Yin Shen ◽

...

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

State Of The Art ◽

Cell Types ◽

Batch Effect ◽

Rna Seq ◽

Cluster Label ◽

Label Information ◽

Cell Type Specific ◽

Biological Differences

Download Full-text

Integrated profiling of single cell epigenomic and transcriptomic landscape of Parkinson’s disease mouse brain

10.1101/2020.02.04.933259 ◽

2020 ◽

Author(s):

Jixing Zhong ◽

Gen Tang ◽

Jiacheng Zhu ◽

Xin Qiu ◽

Weiying Wu ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Single Cell ◽

Early Stage ◽

Cell Types ◽

Cellular Heterogeneity ◽

Rna Seq ◽

Cell Level ◽

Distinct Cell ◽

Single Nucleus

AbstractParkinson’s disease (PD) is a neurodegenerative disease leading to the impairment of execution of movement. PD pathogenesis has been largely investigated, but either restricted in bulk level or at certain cell types, which failed to capture cellular heterogeneity and intrinsic interplays among distinct cell types. To overcome this, we applied single-nucleus RNA-seq and single cell ATAC-seq on cerebellum, midbrain and striatum of PD mouse and matched control. With 74,493 cells in total, we comprehensively depicted the dysfunctions under PD pathology covering proteostasis, neuroinflammation, calcium homeostasis and extracellular neurotransmitter homeostasis. Besides, by multi-omics approach, we identified putative biomarkers for early stage of PD, based on the relationships between transcriptomic and epigenetic profiles. We located certain cell types that primarily contribute to PD early pathology, narrowing the gap between genotypes and phenotypes. Taken together, our study provides a valuable resource to dissect the molecular mechanism of PD pathogenesis at single cell level, which could facilitate the development of novel methods regarding diagnosis, monitoring and practical therapies against PD at early stage.

Download Full-text

scAPAdb: a comprehensive database of alternative polyadenylation at single-cell resolution

Nucleic Acids Research ◽

10.1093/nar/gkab795 ◽

2021 ◽

Author(s):

Sheng Zhu ◽

Qiwei Lian ◽

Wenbin Ye ◽

Wei Qin ◽

Zhe Wu ◽

...

Keyword(s):

Single Cell ◽

Alternative Polyadenylation ◽

Cell Types ◽

Single Cell Level ◽

Cell Heterogeneity ◽

Rna Seq ◽

Cell Level ◽

Eukaryotic Gene ◽

User Friendly ◽

Different Cell Types

Abstract Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.

Download Full-text

SMNN: Batch Effect Correction for Single-cell RNA-seq data via Supervised Mutual Nearest Neighbor Detection

10.1101/672261 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yuchen Yang ◽

Gang Li ◽

Huijun Qian ◽

Kirk C. Wilhelmsen ◽

Yin Shen ◽

...

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

State Of The Art ◽

Nearest Neighbors ◽

Cell Types ◽

Batch Effect ◽

Batch Effects ◽

Cell Type ◽

Label Information ◽

Cell Type Specific

AbstractBatch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3, and LIGER. Furthermore, SMNN retains more cell type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841%.Key PointsBatch effect correction has been recognized to be critical when integrating scRNA-seq data from multiple batches due to systematic differences in time points, generating laboratory and/or handling technician(s), experimental protocol, and/or sequencing platform.Existing batch effect correction methods that leverages information from mutual nearest neighbors across batches (for example, implemented in SC3 or Seurat) ignore cell type information and suffer from potentially mismatching single cells from different cell types across batches, which would lead to undesired correction results, especially under the scenario where variation from batch effects is non-negligible compared with biological effects.To address this critical issue, here we present SMNN, a supervised machine learning method that first takes cluster/cell-type label information from users or inferred from scRNA-seq clustering, and then searches mutual nearest neighbors within each cell type instead of global searching.Our SMNN method shows clear advantages over three state-of-the-art batch effect correction methods and can better mix cells of the same cell type across batches and more effectively recover cell-type specific features, in both simulations and real datasets.

Download Full-text

MarkerCount: A stable, count-based cell type identifier for single cell RNA-Seq experiments

10.21203/rs.3.rs-418249/v1 ◽

2021 ◽

Author(s):

Hanbyeol Kim ◽

Joongho Lee ◽

Keunsoo Kang ◽

Seokhyun Yoon

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Batch Effect ◽

Expression Level ◽

Rna Seq ◽

Cell Type ◽

Stable Performance ◽

Downstream Analysis

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.

Download Full-text

Cell type-specific aging clocks to quantify aging and rejuvenation in regenerative regions of the brain

10.1101/2022.01.10.475747 ◽

2022 ◽

Author(s):

Matthew T Buckley ◽

Eric Sun ◽

Benson M. George ◽

Ling Liu ◽

Nicholas Schaum ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Cell Level ◽

Transcriptomic Data ◽

Precise Quantification ◽

Cell Type Specific ◽

Tissue Aging ◽

The Brain

Aging manifests as progressive dysfunction culminating in death. The diversity of cell types is a challenge to the precise quantification of aging and its reversal. Here we develop a suite of 'aging clocks' based on single cell transcriptomic data to characterize cell type-specific aging and rejuvenation strategies. The subventricular zone (SVZ) neurogenic region contains many cell types and provides an excellent system to study cell-level tissue aging and regeneration. We generated 21,458 single-cell transcriptomes from the neurogenic regions of 28 mice, tiling ages from young to old. With these data, we trained a suite of single cell-based regression models (aging clocks) to predict both chronological age (passage of time) and biological age (fitness, in this case the proliferative capacity of the neurogenic region). Both types of clocks perform well on independent cohorts of mice. Genes underlying the single cell-based aging clocks are mostly cell-type specific, but also include a few shared genes in the interferon and lipid metabolism pathways. We used these single cell-based aging clocks to measure transcriptomic rejuvenation, by generating single cell RNA-seq datasets of SVZ neurogenic regions for two interventions - heterochronic parabiosis (young blood) and exercise. Interestingly, the use of aging clocks reveals that both heterochronic parabiosis and exercise reverse transcriptomic aging in the niche, but in different ways across cell types and genes. This study represents the first development of high-resolution aging clocks from single cell transcriptomic data and demonstrates their application to quantify transcriptomic rejuvenation.

Download Full-text

scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect

iScience ◽

10.1016/j.isci.2020.100914 ◽

2020 ◽

Vol 23 (3) ◽

pp. 100914 ◽

Cited By ~ 7

Author(s):

Katerina Boufea ◽

Sohan Seth ◽

Nizar N. Batada

Keyword(s):

Discriminant Analysis ◽

Single Cell ◽

Cell Types ◽

Batch Effect ◽

Rna Seq

Download Full-text

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009118 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009118

Author(s):

Jing Qi ◽

Yang Zhou ◽

Zicen Zhao ◽

Shuilin Jin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Differential Expression Analysis ◽

Cell Types ◽

Rna Seq ◽

Cell Level ◽

Gene Level ◽

Level Information ◽

Downstream Analysis ◽

Gene Expression Levels

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.

Download Full-text

The Application of Single-Cell RNA Sequencing in Mammalian Meiosis Studies

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.673642 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yiheng Peng ◽

Huanyu Qiao

Keyword(s):

Single Cell ◽

Molecular Mechanisms ◽

Cell Types ◽

Marker Genes ◽

Rna Seq ◽

Cell Level ◽

Cellular Division ◽

Division Process ◽

Complex Events ◽

Downstream Analysis

Meiosis is a cellular division process that produces gametes for sexual reproduction. Disruption of complex events throughout meiosis, such as synapsis and homologous recombination, can lead to infertility and aneuploidy. To reveal the molecular mechanisms of these events, transcriptome studies of specific substages must be conducted. However, conventional methods, such as bulk RNA-seq and RT-qPCR, are not able to detect the transcriptional variations effectively and precisely, especially for identifying cell types and stages with subtle differences. In recent years, mammalian meiotic transcriptomes have been intensively studied at the single-cell level by using single-cell RNA-seq (scRNA-seq) approaches, especially through two widely used platforms, Smart-seq2 and Drop-seq. The scRNA-seq protocols along with their downstream analysis enable researchers to accurately identify cell heterogeneities and investigate meiotic transcriptomes at a higher resolution. In this review, we compared bulk RNA-seq and scRNA-seq to show the advantages of the scRNA-seq in meiosis studies; meanwhile, we also pointed out the challenges and limitations of the scRNA-seq. We listed recent findings from mammalian meiosis (male and female) studies where scRNA-seq applied. Next, we summarized the scRNA-seq analysis methods and the meiotic marker genes from spermatocytes and oocytes. Specifically, we emphasized the different features of the two scRNA-seq protocols (Smart-seq2 and Drop-seq) in the context of meiosis studies and discussed their strengths and weaknesses in terms of different research purposes. Finally, we discussed the future applications of scRNA-seq in the meiosis field.

Download Full-text