scholarly journals CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhiyuan Hu ◽  
Ahmed A. Ahmed ◽  
Christopher Yau

AbstractClustering of joint single-cell RNA-Seq (scRNA-Seq) data is often challenged by confounding factors, such as batch effects and biologically relevant variability. Existing batch effect removal methods typically require strong assumptions on the composition of cell populations being near identical across samples. Here, we present CIDER, a meta-clustering workflow based on inter-group similarity measures. We demonstrate that CIDER outperforms other scRNA-Seq clustering methods and integration approaches in both simulated and real datasets. Moreover, we show that CIDER can be used to assess the biological correctness of integration in real datasets, while it does not require the existence of prior cellular annotations.

2021 ◽  
Author(s):  
Zhiyuan Hu ◽  
Ahmed Ashour Ahmed ◽  
Christopher Yau

Single-cell RNA sequencing (scRNA-Seq) datasets that are produced from clinical samples are often confounded by batch effects and inter-patient variability. Existing batch effect removal methods typically require strong assumptions on the composition of cell populations being near identical across patients. Here we present a novel meta-clustering workflow, CIDER, based on inter-group similarity measures. We demonstrate that CIDER outperforms other scRNA-Seq clustering methods and integration approaches in both simulated and real datasets. Moreover, we show that CIDER can be used to assess the biological correctness of integration in real datasets, while it does not require the existence of prior cellular annotations.


Author(s):  
Krzysztof Polański ◽  
Matthew D Young ◽  
Zhichao Miao ◽  
Kerstin B Meyer ◽  
Sarah A Teichmann ◽  
...  

Abstract Motivation Increasing numbers of large scale single cell RNA-Seq projects are leading to a data explosion, which can only be fully exploited through data integration. A number of methods have been developed to combine diverse datasets by removing technical batch effects, but most are computationally intensive. To overcome the challenge of enormous datasets, we have developed BBKNN, an extremely fast graph-based data integration algorithm. We illustrate the power of BBKNN on large scale mouse atlasing data, and favourably benchmark its run time against a number of competing methods. Availability and implementation BBKNN is available at https://github.com/Teichlab/bbknn, along with documentation and multiple example notebooks, and can be installed from pip. Supplementary information Supplementary data are available at Bioinformatics online.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


Cells ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 3126
Author(s):  
Dominik Saul ◽  
Robyn Laura Kosinsky

The human aging process is associated with molecular changes and cellular degeneration, resulting in a significant increase in cancer incidence with age. Despite their potential correlation, the relationship between cancer- and ageing-related transcriptional changes is largely unknown. In this study, we aimed to analyze aging-associated transcriptional patterns in publicly available bulk mRNA-seq and single-cell RNA-seq (scRNA-seq) datasets for chronic myelogenous leukemia (CML), colorectal cancer (CRC), hepatocellular carcinoma (HCC), lung cancer (LC), and pancreatic ductal adenocarcinoma (PDAC). Indeed, we detected that various aging/senescence-induced genes (ASIGs) were upregulated in malignant diseases compared to healthy control samples. To elucidate the importance of ASIGs during cell development, pseudotime analyses were performed, which revealed a late enrichment of distinct cancer-specific ASIG signatures. Notably, we were able to demonstrate that all cancer entities analyzed in this study comprised cell populations expressing ASIGs. While only minor correlations were detected between ASIGs and transcriptome-wide changes in PDAC, a high proportion of ASIGs was induced in CML, CRC, HCC, and LC samples. These unique cellular subpopulations could serve as a basis for future studies on the role of aging and senescence in human malignancies.


2020 ◽  
Author(s):  
Viacheslav Mylka ◽  
Jeroen Aerts ◽  
Irina Matetovici ◽  
Suresh Poovathingal ◽  
Niels Vandamme ◽  
...  

ABSTRACTMultiplexing of samples in single-cell RNA-seq studies allows significant reduction of experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or - lipids allow barcoding sample-specific cells, a process called ‘hashing’. Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines. Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2020 ◽  
Author(s):  
Ruben Chazarra-Gil ◽  
Stijn van Dongen ◽  
Vladimir Yu Kiselev ◽  
Martin Hemberg

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.


2019 ◽  
Vol 116 (20) ◽  
pp. 9775-9784 ◽  
Author(s):  
Yingxin Lin ◽  
Shila Ghazanfar ◽  
Kevin Y. X. Wang ◽  
Johann A. Gagnon-Bartsch ◽  
Kitty K. Lo ◽  
...  

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Rebekka Wegmann ◽  
Marilisa Neri ◽  
Sven Schuierer ◽  
Bilada Bilican ◽  
Huyen Hartkopf ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document