Cerebro: interactive visualization of scRNA-seq data

Roman Hillje; Pier Giuseppe Pelicci; Lucilla Luzi

doi:10.1093/bioinformatics/btz877

Cerebro: interactive visualization of scRNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz877 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2311-2313 ◽

Cited By ~ 5

Author(s):

Roman Hillje ◽

Pier Giuseppe Pelicci ◽

Lucilla Luzi

Keyword(s):

Single Cell ◽

Effective Interaction ◽

Three Dimensional ◽

R Package ◽

Direct Access ◽

Supplementary Information ◽

Marker Genes ◽

Transcriptomics Data ◽

Or Gene ◽

Access To Data

Abstract Despite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without extensive bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user. Through an interactive and intuitive graphical interface, users can (i) explore similarities and heterogeneity between samples and cell clusters in two-dimensional or three-dimensional projections such as t-SNE or UMAP, (ii) display the expression level of single genes or gene sets of interest, (iii) browse tables of most expressed genes and marker genes for each sample and cluster and (iv) display trajectories calculated with Monocle 2. We provide three examples prepared from publicly available datasets to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists that facilitates effective interaction to shorten the gap between analysis and interpretation of the data. Availability and implementation The Cerebro application, additional documentation, and example datasets are available at https://github.com/romanhaa/Cerebro. Similarly, the cerebroApp R package is available at https://github.com/romanhaa/cerebroApp. All components are released under the MIT License. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cerebro: Interactive visualization of scRNA-seq data

10.1101/631705 ◽

2019 ◽

Cited By ~ 5

Author(s):

Roman Hillje ◽

Pier Giuseppe Pelicci ◽

Lucilla Luzi

Keyword(s):

Single Cell ◽

Effective Interaction ◽

Direct Access ◽

Marker Genes ◽

Data Sets ◽

Link Type ◽

Transcriptomics Data ◽

R Packages ◽

Access To Data ◽

Or Genes

AbstractSummaryDespite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows, which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user.Through an interactive and intuitive graphical interface, users can i) explore similarities and heterogeneity between samples and cells clusters in 2D or 3D projections such as t-SNE or UMAP, ii) display the expression level of single genes or genes sets of interest, iii) browse tables of most expressed genes and marker genes for each sample and cluster.We provide a simple example to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists which facilitates effective interaction to shorten the gap between analysis and interpretation of the data.AvailabilityCerebro and example data sets are available at https://github.com/romanhaa/Cerebro. Similarly, the R packages cerebroApp and cerebroPrepare R packages are available at https://github.com/romanhaa/cerebroApp and https://github.com/romanhaa/cerebroPrepare, respectively. All components are released under the MIT License.

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btab257 ◽

2021 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PPIT: an R package for inferring microbial taxonomy from nifH sequences

Bioinformatics ◽

10.1093/bioinformatics/btab100 ◽

2021 ◽

Author(s):

Bennett J Kapili ◽

Anne E Dekas

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Query Sequence ◽

Marker Gene ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Pairwise Identity ◽

Metabolic Marker ◽

Microbial Taxonomy

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data

Bioinformatics ◽

10.1093/bioinformatics/bty058 ◽

2018 ◽

Vol 34 (12) ◽

pp. 2077-2086 ◽

Cited By ~ 33

Author(s):

Suoqin Jin ◽

Adam L MacLean ◽

Tao Peng ◽

Qing Nie

Keyword(s):

Single Cell ◽

Transition Probabilities ◽

Energy Landscape ◽

Control Cell ◽

Robust Inference ◽

Supplementary Information ◽

Myoblast Differentiation ◽

State Transitions ◽

Marker Genes ◽

Cell State

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using ‘single-cell energy’ and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are—in combination—more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments

Bioinformatics ◽

10.1093/bioinformatics/btz048 ◽

2019 ◽

Vol 35 (17) ◽

pp. 2916-2923 ◽

Cited By ~ 15

Author(s):

John C Stansfield ◽

Kellen G Cresswell ◽

Mikhail G Dozmorov

Keyword(s):

Comparative Analysis ◽

A Priori ◽

Three Dimensional ◽

R Package ◽

Supplementary Information ◽

Chromatin Interaction ◽

Model Framework ◽

Chromatin Interactions ◽

Loess Regression ◽

Sequencing Studies

Abstract Motivation With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets. Results Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights. Availability and implementation multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

schex avoids overplotting for large single-cell RNA-sequencing datasets

Bioinformatics ◽

10.1093/bioinformatics/btz907 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2291-2292 ◽

Cited By ~ 1

Author(s):

Saskia Freytag ◽

Ryan Lister

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scSVA: an interactive tool for big data visualization and exploration in single-cell omics

10.1101/512582 ◽

2019 ◽

Cited By ~ 8

Author(s):

Marcin Tabaka ◽

Joshua Gould ◽

Aviv Regev

Keyword(s):

Single Cell ◽

Three Dimensional ◽

R Package ◽

Reproducible Research ◽

Data Embedding ◽

3D Data ◽

Big Data Visualization ◽

Data Visualizations ◽

Cell Data ◽

Memory Efficient

AbstractWe present scSVA (single-cell Scalable Visualization and Analytics), a lightweight R package for interactive two- and three-dimensional visualization and exploration of massive single-cell omics data. Building in part of methods originally developed for astronomy datasets, scSVA is memory efficient for more than hundreds of millions of cells, can be run locally or in a cloud, and generates high-quality figures. In particular, we introduce a numerically efficient method for single-cell data embedding in 3D which combines an optimized implementation of diffusion maps with a 3D force-directed layout, enabling generation of 3D data visualizations at the scale of a million cells. To facilitate reproducible research, scSVA supports interactive analytics in a cloud with containerized tools. scSVA is available online at https://github.com/klarman-cell-observatory/scSVA.

Download Full-text