Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape

AbstractIncreasing numbers of large scale single cell RNA-Seq projects are leading to a data explosion, which can only be fully exploited through data integration. Therefore, efficient computational tools for combining diverse datasets are crucial for biology in the single cell genomics era. A number of methods have been developed to assist data integration by removing technical batch effects, but most are computationally intensive. To overcome the challenge of enormous datasets, we have developed BBKNN, an extremely fast graph-based data integration method. We illustrate the power of BBKNN for dimensionalityreduced visualisation and clustering in multiple biological scenarios, including a massive integrative study over several murine atlases. BBKNN successfully connects cell populations across experimentally heterogeneous mouse scRNA-Seq datasets, which reveals global markers of cell type and organspecificity and provides the foundation for inferring the underlying transcription factor network. BBKNN is available at https://github.com/Teichlab/bbknn.

Download Full-text

BBKNN: fast batch alignment of single cell transcriptomes

Bioinformatics ◽

10.1093/bioinformatics/btz625 ◽

2019 ◽

Cited By ~ 33

Author(s):

Krzysztof Polański ◽

Matthew D Young ◽

Zhichao Miao ◽

Kerstin B Meyer ◽

Sarah A Teichmann ◽

...

Keyword(s):

Data Integration ◽

Single Cell ◽

Large Scale ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Batch Effects ◽

Integration Algorithm ◽

Data Explosion ◽

Computationally Intensive

Abstract Motivation Increasing numbers of large scale single cell RNA-Seq projects are leading to a data explosion, which can only be fully exploited through data integration. A number of methods have been developed to combine diverse datasets by removing technical batch effects, but most are computationally intensive. To overcome the challenge of enormous datasets, we have developed BBKNN, an extremely fast graph-based data integration algorithm. We illustrate the power of BBKNN on large scale mouse atlasing data, and favourably benchmark its run time against a number of competing methods. Availability and implementation BBKNN is available at https://github.com/Teichlab/bbknn, along with documentation and multiple example notebooks, and can be installed from pip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scTIM: seeking cell-type-indicative marker from single cell RNA-seq data by consensus optimization

Bioinformatics ◽

10.1093/bioinformatics/btz936 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2474-2485 ◽

Cited By ~ 2

Author(s):

Zhanying Feng ◽

Xianwen Ren ◽

Yuan Fang ◽

Yining Yin ◽

Chutian Huang ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cell Types ◽

Mouse Cell ◽

Supplementary Information ◽

Rna Seq ◽

Cell Type ◽

Robust Solution ◽

Development Trajectory ◽

Consensus Optimization

Abstract Motivation Single cell RNA-seq data offers us new resource and resolution to study cell type identity and its conversion. However, data analyses are challenging in dealing with noise, sparsity and poor annotation at single cell resolution. Detecting cell-type-indicative markers is promising to help denoising, clustering and cell type annotation. Results We developed a new method, scTIM, to reveal cell-type-indicative markers. scTIM is based on a multi-objective optimization framework to simultaneously maximize gene specificity by considering gene-cell relationship, maximize gene’s ability to reconstruct cell–cell relationship and minimize gene redundancy by considering gene–gene relationship. Furthermore, consensus optimization is introduced for robust solution. Experimental results on three diverse single cell RNA-seq datasets show scTIM’s advantages in identifying cell types (clustering), annotating cell types and reconstructing cell development trajectory. Applying scTIM to the large-scale mouse cell atlas data identifies critical markers for 15 tissues as ‘mouse cell marker atlas’, which allows us to investigate identities of different tissues and subtle cell types within a tissue. scTIM will serve as a useful method for single cell RNA-seq data mining. Availability and implementation scTIM is freely available at https://github.com/Frank-Orwell/scTIM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Predicting heterogeneity in clone-specific therapeutic vulnerabilities using single-cell transcriptomic signatures

Genome Medicine ◽

10.1186/s13073-021-01000-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Chayaporn Suphavilai ◽

Shumei Chia ◽

Ankur Sharma ◽

Lorna Tu ◽

Rafael Peres Da Silva ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Drug Response ◽

Drug Repurposing ◽

High Accuracy ◽

Molecular Heterogeneity ◽

Precision Oncology ◽

Scale Analysis ◽

Rna Seq ◽

Large Scale Analysis

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

SCOTCluster: Deep Clustering with Optimal Transport for Large-scale Single-cell RNA-seq Data

10.1109/bibm52615.2021.9669800 ◽

2021 ◽

Author(s):

Faning Long ◽

Xiaojun Ding ◽

Xiaoqing Peng ◽

Jianxin Wang ◽

Xiaoshu Zhu

Keyword(s):

Single Cell ◽

Large Scale ◽

Optimal Transport ◽

Rna Seq

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.10.003 ◽

2019 ◽

Vol 17 (2) ◽

pp. 201-210 ◽

Cited By ~ 5

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Large Scale ◽

Rna Seq ◽

Computational Framework

Download Full-text

Abstract 5297: LCA: A robust and scalable algorithm to reveal subtle diversity in large-scale single-cell RNA-Seq data

10.1158/1538-7445.am2018-5297 ◽

2018 ◽

Author(s):

Changde Cheng ◽

John Easton ◽

Celeste Rosencrance ◽

Yan Li ◽

Bensheng Ju ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Rna Seq ◽

Scalable Algorithm

Download Full-text

scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1820006116 ◽

2019 ◽

Vol 116 (20) ◽

pp. 9775-9784 ◽

Cited By ~ 38

Author(s):

Yingxin Lin ◽

Shila Ghazanfar ◽

Kevin Y. X. Wang ◽

Johann A. Gagnon-Bartsch ◽

Kitty K. Lo ◽

...

Keyword(s):

Factor Analysis ◽

Data Integration ◽

Single Cell ◽

Rna Seq ◽

Cell Type ◽

Large Collection ◽

Single Cell Rna Sequencing ◽

Development Trajectory ◽

Biological Discovery ◽

Public Datasets

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.

Download Full-text