Cerebro: Interactive visualization of scRNA-seq data

AbstractSummaryDespite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows, which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user.Through an interactive and intuitive graphical interface, users can i) explore similarities and heterogeneity between samples and cells clusters in 2D or 3D projections such as t-SNE or UMAP, ii) display the expression level of single genes or genes sets of interest, iii) browse tables of most expressed genes and marker genes for each sample and cluster.We provide a simple example to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists which facilitates effective interaction to shorten the gap between analysis and interpretation of the data.AvailabilityCerebro and example data sets are available at https://github.com/romanhaa/Cerebro. Similarly, the R packages cerebroApp and cerebroPrepare R packages are available at https://github.com/romanhaa/cerebroApp and https://github.com/romanhaa/cerebroPrepare, respectively. All components are released under the MIT License.

Download Full-text

Cerebro: interactive visualization of scRNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz877 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2311-2313 ◽

Cited By ~ 5

Author(s):

Roman Hillje ◽

Pier Giuseppe Pelicci ◽

Lucilla Luzi

Keyword(s):

Single Cell ◽

Effective Interaction ◽

Three Dimensional ◽

R Package ◽

Direct Access ◽

Supplementary Information ◽

Marker Genes ◽

Transcriptomics Data ◽

Or Gene ◽

Access To Data

Abstract Despite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without extensive bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user. Through an interactive and intuitive graphical interface, users can (i) explore similarities and heterogeneity between samples and cell clusters in two-dimensional or three-dimensional projections such as t-SNE or UMAP, (ii) display the expression level of single genes or gene sets of interest, (iii) browse tables of most expressed genes and marker genes for each sample and cluster and (iv) display trajectories calculated with Monocle 2. We provide three examples prepared from publicly available datasets to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists that facilitates effective interaction to shorten the gap between analysis and interpretation of the data. Availability and implementation The Cerebro application, additional documentation, and example datasets are available at https://github.com/romanhaa/Cerebro. Similarly, the cerebroApp R package is available at https://github.com/romanhaa/cerebroApp. All components are released under the MIT License. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Probabilistic Harmonization and Annotation of Single-cell Transcriptomics Data with Deep Generative Models

10.1101/532895 ◽

2019 ◽

Cited By ~ 14

Author(s):

Chenling Xu ◽

Romain Lopez ◽

Edouard Mehlman ◽

Jeffrey Regier ◽

Michael I. Jordan ◽

...

Keyword(s):

Single Cell ◽

Probabilistic Approach ◽

Cell Types ◽

Generative Models ◽

Marker Genes ◽

Data Sets ◽

Data Set ◽

Cell State ◽

Transcriptomics Data ◽

Single Data

AbstractAs single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

VeloViz: RNA-velocity informed 2D embeddings for visualizing cellular trajectories

10.1101/2021.01.28.425293 ◽

2021 ◽

Author(s):

Lyla Atta ◽

Jean Fan

Keyword(s):

Single Cell ◽

Principal Components ◽

R Package ◽

Velocity Analysis ◽

Cell State ◽

Link Type ◽

Transcriptomics Data ◽

Reliable Representation ◽

State Changes

0AbstractRNA velocity analysis can predict cell state changes from single cell transcriptomics data. To interpret these cell state changes as part of underlying cellular trajectories, current approaches rely on visualization with 2D embeddings derived from principal components, t-distributed stochastic neighbor embedding, among others. However, these 2D embeddings can yield different representations of the underlying trajectories, hindering the interpretation of cell state changes. To address this challenge, we developed VeloViz to create RNA-velocity-informed 2D embeddings. We show that by taking into consideration the predicted future transcriptional states from RNA velocity analysis, VeloViz can help ensure a more reliable representation of underlying cellular trajectories. VeloViz is available as an R package at https://github.com/JEFworks-Lab/veloviz.

Download Full-text

Shape-aware Stochastic Neighbour Embedding for Robust Data Visualisations

10.21203/rs.3.rs-668207/v1 ◽

2021 ◽

Author(s):

Tobias Wängberg ◽

Chun-Biu Li ◽

Joanna Tyrcha

Keyword(s):

Single Cell ◽

Cluster Structure ◽

Synthetic Data ◽

Image Data ◽

Superior Performance ◽

Test Cases ◽

Data Sets ◽

Transcriptomics Data ◽

Quantitative Validation ◽

Graph Distances

Abstract The t-distributed Stochastic Neighbour Embedding (t-SNE) method has emerged as one of the leading methods for visualising High Dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single cell transcriptomics data. However, several shortcomings of the algorithm have been identified. Specifically, t-SNE is often unable to correctly represent hierarchical relationships between clusters and spurious patterns may arise in the embedding due to incorrect parameter settings, which could lead to misinterpretations of the data. Here we incorporate t-SNE with shape-aware graph distances, a method termed shape-aware stochastic neighbour embedding (SASNE), to mitigate these limitations of the t-SNE. The merits of the SASNE are first demonstrated using synthetic data sets, where we see a significant improvement in embedding imbalanced and nonlinear clusters, as well as preservation of hierarchical structure, based on quantitative validation in clustering and dimensionality reductions. Moreover, we propose a data-driven parameter setting which we find consistently optimal in all test cases. Lastly, we demonstrate the superior performance of SASNE in embedding the MNIST image data and the single cell transcriptomics gene expression data.

Download Full-text

Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data

10.1101/183863 ◽

2017 ◽

Cited By ~ 4

Author(s):

Sabrina Rashid ◽

Sohrab Shah ◽

Ziv Bar-Joseph ◽

Ravi Pandya

Keyword(s):

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Genomic Data ◽

Feature Space ◽

Marker Genes ◽

Tumor Evolution ◽

Evolutionary Trajectory ◽

Link Type ◽

Variational Autoencoder

AbstractMotivationIntra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.ResultsHere we describe ‘Dhaka’, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.Availability and ImplementationAll the datasets used in the paper are publicly available and developed software package is available on Github https://github.com/MicrosoftGenomics/Dhaka.Supporting info and Software: https://github.com/MicrosoftGenomics/Dhaka

Download Full-text

Integration of the Microbiome, Metabolome and Transcriptomics Data Identified Novel Metabolic Pathway Regulation in Colorectal Cancer

International Journal of Molecular Sciences ◽

10.3390/ijms22115763 ◽

2021 ◽

Vol 22 (11) ◽

pp. 5763

Author(s):

Vartika Bisht ◽

Katrina Nash ◽

Yuanwei Xu ◽

Prasoon Agarwal ◽

Sofie Bosch ◽

...

Keyword(s):

Colorectal Cancer ◽

Single Cell ◽

Rna Sequencing ◽

Therapeutic Targets ◽

Data Sets ◽

Cancer Pathogenesis ◽

Oxidative Phosphorylation Pathway ◽

Transcriptomics Data ◽

Pathway Regulation ◽

Potential Interactions

Integrative multiomics data analysis provides a unique opportunity for the mechanistic understanding of colorectal cancer (CRC) in addition to the identification of potential novel therapeutic targets. In this study, we used public omics data sets to investigate potential associations between microbiome, metabolome, bulk transcriptomics and single cell RNA sequencing datasets. We identified multiple potential interactions, for example 5-aminovalerate interacting with Adlercreutzia; cholesteryl ester interacting with bacterial genera Staphylococcus, Blautia and Roseburia. Using public single cell and bulk RNA sequencing, we identified 17 overlapping genes involved in epithelial cell pathways, with particular significance of the oxidative phosphorylation pathway and the ACAT1 gene that indirectly regulates the esterification of cholesterol. These findings demonstrate that the integration of multiomics data sets from diverse populations can help us in untangling the colorectal cancer pathogenesis as well as postulate the disease pathology mechanisms and therapeutic targets.

Download Full-text

A descriptive marker gene approach to single-cell pseudotime inference

10.1101/060442 ◽

2016 ◽

Cited By ~ 5

Author(s):

Kieran R Campbell ◽

Christopher Yau

Keyword(s):

Single Cell ◽

Marker Gene ◽

Cell Types ◽

R Package ◽

Estimation Methods ◽

Marker Genes ◽

Peak Time ◽

Transient Behaviour ◽

Link Type ◽

Cell Gene Expression

AbstractPseudotime estimation from single-cell gene expression allows the recovery of temporal information from otherwise static profiles of individual cells. This pseudotemporal information can be used to characterise transient events in temporally evolving biological systems. Conventional algorithms typically emphasise an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. Here we introduce an orthogonal approach termed “Ouija” that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. In the following we introduce our model and demonstrate that in many instances a small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify “metastable” states - discrete cell types along the continuous trajectories - that recapitulate known cell types. Ouija therefore provides a powerful complimentary approach to existing whole transcriptome based pseudotime estimation methods. An open source implementation is available at http://www.github.com/kieranrcampbell/ouija as an R package and at http://www.github.com/kieranrcampbell/ouijaflow as a Python/TensorFlow package.

Download Full-text

Venice: A New Algorithm for Finding Marker Genes in Single-Cell Transcriptomic Data

10.1101/2020.11.16.384479 ◽

2020 ◽

Author(s):

Hy Vuong ◽

Thao Truong ◽

Tan Phan ◽

Son Pham

Keyword(s):

Single Cell ◽

Cell Population ◽

Cell Types ◽

Marker Genes ◽

Data Sets ◽

Interactive Analysis ◽

Expression Of Genes ◽

A Cell ◽

Definition Of ◽

Cell Data

AbstractMost widely used tools for finding marker genes in single cell data (SeuratT/NegBinom/Poisson, CellRanger, EdgeR, limmatrend) use a conventional definition of differentially expressed genes: genes with different mean expression values. However, in single-cell data, a cell population can be a mixture of many cell types/cell states, hence the mean expression of genes cannot represent the whole population. In addition, these tools assume that gene expression of a population belongs to a specific family of distribution. This assumption is often violated in single-cell data. In this work, we define marker genes of a cell population as genes that can be used to distinguish cells in the population from cells in other populations. Besides log-fold change, we devise a new metric to classify genes into up-regulated, down-regulated, and transitional states. In a benchmark for finding up-regulated and down-regulated genes, our tool outperforms all compared methods, including Seurat, ROTS, scDD, edgeR, MAST, limma, normal t-test, Wilcoxon and Kolmogorov–Smirnov test. Our method is much faster than all compared methods, therefore, enables interactive analysis for large single-cell data sets in BioTuring Browser. Venice algorithm is available within Signac package: https://github.com/bioturing/signac1).

Download Full-text