deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.

Download Full-text

A Comprehensive Multi-Center Cross-platform Benchmarking Study of Single-cell RNA Sequencing Using Reference Samples

10.1101/2020.03.27.010249 ◽

2020 ◽

Author(s):

Wanqiu Chen ◽

Yongmei Zhao ◽

Xin Chen ◽

Xiaojiang Xu ◽

Zhaowei Yang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Batch Effect ◽

Batch Effects ◽

Data Set ◽

Batch Correction ◽

Single Cell Rna Sequencing ◽

Cross Platform ◽

Reference Samples

AbstractSingle-cell RNA sequencing (scRNA-seq) has become a very powerful technology for biomedical research and is becoming much more affordable as methods continue to evolve, but it is unknown how reproducible different platforms are using different bioinformatics pipelines, particularly the recently developed scRNA-seq batch correction algorithms. We carried out a comprehensive multi-center cross-platform comparison on different scRNA-seq platforms using standard reference samples. We compared six pre-processing pipelines, seven bioinformatics normalization procedures, and seven batch effect correction methods including CCA, MNN, Scanorama, BBKNN, Harmony, limma and ComBat to evaluate the performance and reproducibility of 20 scRNA-seq data sets derived from four different platforms and centers. We benchmarked scRNA-seq performance across different platforms and testing sites using global gene expression profiles as well as some cell-type specific marker genes. We showed that there were large batch effects; and the reproducibility of scRNA-seq across platforms was dictated both by the expression level of genes selected and the batch correction methods used. We found that CCA, MNN, and BBKNN all corrected the batch variations fairly well for the scRNA-seq data derived from biologically similar samples across platforms/sites. However, for the scRNA-seq data derived from or consisting of biologically distinct samples, limma and ComBat failed to correct batch effects, whereas CCA over-corrected the batch effect and misclassified the cell types and samples. In contrast, MNN, Harmony and BBKNN separated biologically different samples/cell types into correspondingly distinct dimensional subspaces; however, consistent with this algorithm’s logic, MNN required that the samples evaluated each contain a shared portion of highly similar cells. In summary, we found a great cross-platform consistency in separating two distinct samples when an appropriate batch correction method was used. We hope this large cross-platform/site scRNA-seq data set will provide a valuable resource, and that our findings will offer useful advice for the single-cell sequencing community.

Download Full-text

Comparison of Scanpy-based algorithms to remove the batch effect from single-cell RNA-seq data

Cell Regeneration ◽

10.1186/s13619-020-00041-9 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Jiaqi Li ◽

Chengxuan Yu ◽

Lifeng Ma ◽

Jingjing Wang ◽

Guoji Guo

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Batch Effect ◽

Rna Seq ◽

Batch Effects ◽

Integration Methods ◽

Batch Correction ◽

Single Cell Rna Sequencing ◽

Algorithm Level

AbstractWith the development of single-cell RNA sequencing (scRNA-seq) technology, analysts need to integrate hundreds of thousands of cells with multiple experimental batches. It is becoming increasingly difficult for users to select the best integration methods to remove batch effects. Here, we compared the advantages and limitations of four commonly used Scanpy-based batch-correction methods using two representative and large-scale scRNA-seq datasets. We quantitatively evaluated batch-correction performance and efficiency. Furthermore, we discussed the performance differences among the evaluated methods at the algorithm level.

Download Full-text

BERMUDA: A novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes

10.1101/641191 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tongxin Wang ◽

Travis S Johnson ◽

Wei Shao ◽

Zixiao Lu ◽

Bryan R Helm ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Transfer Learning ◽

Cell Types ◽

Batch Effect ◽

Batch Effects ◽

Combine Data ◽

Batch Correction ◽

Single Cell Rna Sequencing ◽

Bona Fide

AbstractTo fully utilize the power of single-cell RNA sequencing (scRNA-seq) technologies for cell lineation and identifyingbona fidetranscriptional signals, it is necessary to combine data from multiple experiments. We presentBERMUDA(Batch-Effect ReMoval Using Deep Autoencoders) — a novel transfer-learning-based method for batch-effect correction in scRNA-seq data.BERMUDAeffectively combines different batches of scRNA-seq data with vastly different cell population compositions and amplifies biological signals by transferring information among batches. We demonstrate thatBERMUDAoutperforms existing methods for removing batch effects and distinguishing cell types in multiple simulated and real scRNA-seq datasets.

Download Full-text

Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours

10.1101/165118 ◽

2017 ◽

Cited By ~ 15

Author(s):

Laleh Haghverdi ◽

Aaron T. L. Lun ◽

Michael D. Morgan ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Data Sets ◽

Batch Effects ◽

Sequencing Data ◽

Batch Correction ◽

Nearest Neighbours ◽

Single Cell Rna Sequencing ◽

New Strategy

AbstractThe presence of batch effects is a well-known problem in experimental data analysis, and single- cell RNA sequencing (scRNA-seq) is no exception. Large-scale scRNA-seq projects that generate data from different laboratories and at different times are rife with batch effects that can fatally compromise integration and interpretation of the data. In such cases, computational batch correction is critical for eliminating uninteresting technical factors and obtaining valid biological conclusions. However, existing methods assume that the composition of cell populations are either known or the same across batches. Here, we present a new strategy for batch correction based on the detection of mutual nearest neighbours in the high-dimensional expression space. Our approach does not rely on pre-defined or equal population compositions across batches, only requiring that a subset of the population be shared between batches. We demonstrate the superiority of our approach over existing methods on a range of simulated and real scRNA-seq data sets. We also show how our method can be applied to integrate scRNA-seq data from two separate studies of early embryonic development.

Download Full-text

Single-cell RNA sequencing of adultDrosophilaovary identifies transcriptional programs governing oogenesis

10.1101/802314 ◽

2019 ◽

Author(s):

Allison Jevitt ◽

Deeptiman Chatterjee ◽

Gengqiang Xie ◽

Xian-Feng Wang ◽

Taylor Otwell ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Developmental Process ◽

Expression Profiles ◽

Follicle Cell ◽

Cell Types ◽

Broad Perspective ◽

Single Cell Rna Sequencing ◽

Egg Shell

AbstractOogenesis is a complex developmental process that involves spatiotemporally regulated coordination between the germline and supporting, somatic cell populations. This process has been modelled extensively using theDrosophilaovary. While different ovarian cell types have been identified through traditional means, the large-scale expression profiles underlying each cell type remain unknown. Using single-cell RNA sequencing technology, we have built a transcriptomic dataset for the adultDrosophilaovary and connected tissues. This dataset captures the entire transcriptional trajectory of the developing follicle cell population over time. Our findings provide detailed insight into processes such as cell-cycle switching, migration, symmetry breaking, nurse cell engulfment, egg-shell formation, and signaling during corpus luteum formation, marking a newly identified oogenesis-to-ovulation transition. Altogether, these findings provide a broad perspective on oogenesis at a single-cell resolution while revealing new genetic markers and fate-specific transcriptional signatures to facilitate future studies.

Download Full-text

Comparison of computational methods for imputing single-cell RNA-sequencing data

10.1101/241190 ◽

2017 ◽

Cited By ~ 10

Author(s):

Lihua Zhang ◽

Shihua Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Real Data ◽

Cell Types ◽

Biological Functions ◽

Sequencing Data ◽

Imputation Methods ◽

Future Studies ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA-sequencing (scRNA-seq) is a recent breakthrough technology, which paves the way for measuring RNA levels at single cell resolution to study precise biological functions. One of the main challenges when analyzing scRNA-seq data is the presence of zeros or dropout events, which may mislead downstream analyses. To compensate the dropout effect, several methods have been developed to impute gene expression since the first Bayesian-based method being proposed in 2016. However, these methods have shown very diverse characteristics in terms of model hypothesis and imputation performance. Thus, large-scale comparison and evaluation of these methods is urgently needed now. To this end, we compared eight imputation methods, evaluated their power in recovering original real data, and performed broad analyses to explore their effects on clustering cell types, detecting differentially expressed genes, and reconstructing lineage trajectories in the context of both simulated and real data. Simulated datasets and case studies highlight that there are no one method performs the best in all the situations. Some defects of these methods such as scalability, robustness and unavailability in some situations need to be addressed in future studies.

Download Full-text

Flexible Experimental Designs for Valid Single-cell RNA-sequencing Experiments Allowing Batch Effects Correction

10.1101/533372 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fangda Song ◽

Ga Ming Chan ◽

Yingying Wei

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Experimental Designs ◽

Batch Effects ◽

Bayesian Hierarchical ◽

Single Cell Rna Sequencing ◽

Randomized Experimental Design ◽

Chain Type

AbstractDespite their widespread applications, single-cell RNA-sequencing (scRNA-seq) experiments are still plagued by batch effects and dropout events. Although the completely randomized experimental design has frequently been advocated to control for batch effects, it is rarely implemented in real applications due to time and budget constraints. Here, we mathematically prove that under two more flexible and realistic experimental designs—the “reference panel” and the “chain-type” designs—true biological variability can also be separated from batch effects. We develop Batch effects correction with Unknown Subtypes for scRNA-seq data (BUSseq), which is an interpretable Bayesian hierarchical model that closely follows the data-generating mechanism of scRNA-seq experiments. BUSseq can simultaneously correct batch effects, cluster cell types, impute missing data caused by dropout events, and detect differentially expressed genes without requiring a preliminary normalization step. We demonstrate that BUSseq outperforms existing methods with simulated and real data.

Download Full-text

CellFishing.jl: an ultrafast and scalable cell search method for single-cell RNA-sequencing

10.1101/374462 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kenta Sato ◽

Koki Tsuyuzaki ◽

Kentaro Shimizu ◽

Itoshi Nikaido

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

State Of The Art ◽

Cell Types ◽

Cell Search ◽

Wide Range ◽

Single Cell Rna Sequencing ◽

Comparable Accuracy ◽

Multicellular Organisms

AbstractRecent technical improvements in single-cell RNA sequencing (scRNA-seq) have enabled massively parallel profiling of transcriptomes, thereby promoting large-scale studies encompassing a wide range of cell types of multicellular organisms. With this background, we propose CellFishing.jl, a new method for searching atlas-scale datasets for similar cells and detecting noteworthy genes of query cells with high accuracy and throughput. Using multiple scRNA-seq datasets, we validate that our method demonstrates comparable accuracy to and is markedly faster than the state-of-the-art software. Moreover, CellFishing.jl is scalable to more than one million cells, and the throughput of the search is approximately 1,600 cells per second.

Download Full-text

iMAP: integration of multiple single-cell datasets by adversarial paired transfer networks

Genome Biology ◽

10.1186/s13059-021-02280-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dongfang Wang ◽

Siyu Hou ◽

Lei Zhang ◽

Xiliang Wang ◽

Baolin Liu ◽

...

Keyword(s):

Tumor Microenvironment ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Batch Effect ◽

Generative Adversarial Networks ◽

Multiple Sources ◽

Adversarial Networks ◽

Mixing Distributions ◽

Single Cell Rna Sequencing

AbstractThe integration of single-cell RNA-sequencing datasets from multiple sources is critical for deciphering cell-to-cell heterogeneities and interactions in complex biological systems. We present a novel unsupervised batch effect removal framework, called iMAP, based on both deep autoencoders and generative adversarial networks. Compared with current methods, iMAP shows superior, robust, and scalable performance in terms of both reliably detecting the batch-specific cells and effectively mixing distributions of the batch-shared cell types. Applying iMAP to tumor microenvironment datasets from two platforms, Smart-seq2 and 10x Genomics, we find that iMAP can leverage the powers of both platforms to discover novel cell-cell interactions.

Download Full-text

Analysis of single-cell RNA sequencing data based on autoencoders

BMC Bioinformatics ◽

10.1186/s12859-021-04150-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Andrea Tangherloni ◽

Federico Ricciuti ◽

Daniela Besozzi ◽

Pietro Liò ◽

Ana Cvejic

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Batch Effect ◽

Learning Approaches ◽

Sequencing Data ◽

Starting Point ◽

Single Cell Rna Sequencing ◽

Downstream Analysis ◽

Low Dimensional

Abstract Background Single-cell RNA sequencing (scRNA-Seq) experiments are gaining ground to study the molecular processes that drive normal development as well as the onset of different pathologies. Finding an effective and efficient low-dimensional representation of the data is one of the most important steps in the downstream analysis of scRNA-Seq data, as it could provide a better identification of known or putatively novel cell-types. Another step that still poses a challenge is the integration of different scRNA-Seq datasets. Though standard computational pipelines to gain knowledge from scRNA-Seq data exist, a further improvement could be achieved by means of machine learning approaches. Results Autoencoders (AEs) have been effectively used to capture the non-linearities among gene interactions of scRNA-Seq data, so that the deployment of AE-based tools might represent the way forward in this context. We introduce here scAEspy, a unifying tool that embodies: (1) four of the most advanced AEs, (2) two novel AEs that we developed on purpose, (3) different loss functions. We show that scAEspy can be coupled with various batch-effect removal tools to integrate data by different scRNA-Seq platforms, in order to better identify the cell-types. We benchmarked scAEspy against the most used batch-effect removal tools, showing that our AE-based strategies outperform the existing solutions. Conclusions scAEspy is a user-friendly tool that enables using the most recent and promising AEs to analyse scRNA-Seq data by only setting up two user-defined parameters. Thanks to its modularity, scAEspy can be easily extended to accommodate new AEs to further improve the downstream analysis of scRNA-Seq data. Considering the relevant results we achieved, scAEspy can be considered as a starting point to build a more comprehensive toolkit designed to integrate multi single-cell omics.

Download Full-text