Natian and Ryabhatta—graphical user interfaces to create, analyze and visualize single-cell transcriptomic datasets

Mapping Intimacies ◽

10.1101/2021.06.17.448424 ◽

2021 ◽

Author(s):

Sathiyanarayanan Manivannan ◽

Vidu Garg

Keyword(s):

Quality Control ◽

Single Cell ◽

User Interfaces ◽

Dimensional Reduction ◽

Life Sciences ◽

Principal Component ◽

Specific Gene ◽

Gene Count ◽

The Individual ◽

Cell Data

Single-cell transcriptomic analyses permit a high-resolution investigation of biological processes at the individual cell level. Single-cell transcriptomics technologies such as Drop-seq, Smart-seq, MARS-seq, sci-RNA-seq, and CELL-seq produce large volumes of data in the form of sequence reads. In general, the alignment of the reads to genomes and the enumeration of reads mapping to a specific gene results in a gene-count matrix. These gene-count matrix data require robust quality control and statistical analytical pipelines before data mining and interpretation. Among these post-alignment pipelines, the 'Seurat' package in 'R' is the most popular analytical pipeline for the analysis of single-cell data. This package provides quality control, normalization, principal component analysis, dimensional reduction, clustering, and marker identification among other functions needed to process and mine the single-cell transcriptomic data. While the Seurat package is continuously updated and includes a variety of functionalities, the user is still required to be proficient in the 'R' programming language and its data structures to be able to execute the Seurat functions. Hence, there is a demand for a graphical user interface (GUI) that takes in relevant input information and processes the single-cell data using the Seurat pipeline. A GUI will also highly improve the access to single-cell data for life sciences researchers who are not trained in the command-line operation of the 'R' platform. To meet this demand, we present R Shiny apps 'Natian' and 'Ryabhatta' to assist in the generation and analysis of Seurat files from a variety of different sources. The apps and example data can be downloaded from https://singlecelltranscriptomics.org. Natian allows users to create Seurat files from the output of multiple pipelines, integrate existing Seurat files, add metadata information, perform dimensional reduction analysis or upload dimensional reduction data, resume partially processed Seurat files and find cluster markers. Ryabhatta allows users to visualize gene expression using a variety of plotting options, analyze cluster markers, rename clusters, select cells from a graph or based on expression levels of markers, perform differential expression, count the number of cells in each condition, and perform pseudotime analysis using Monocle. We found that the use of these apps substantially improved the analytical and processing time and remove needless troubleshooting due to incompatible commands, typographical errors in scripts, and cluttering of the R environment with variables. We hope the use of these apps improves the use of single-cell data for life sciences research while also providing a tool to learn the functionalities of Seurat and R functions available for single-cell data analysis.

Download Full-text

baredSC: Bayesian approach to retrieve expression distribution of single-cell data

BMC Bioinformatics ◽

10.1186/s12859-021-04507-8 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Lucille Lopez-Delisle ◽

Jean-Baptiste Delisle

Keyword(s):

Single Cell ◽

Bayesian Approach ◽

Genetic Interaction ◽

Gaussian Mixture ◽

Two Dimensions ◽

Biological Data ◽

Specific Gene ◽

Trimodal Distribution ◽

Embryonic Limb ◽

Cell Data

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.

Download Full-text

Molecular Cross-Validation for Single-Cell RNA-seq

10.1101/786269 ◽

2019 ◽

Cited By ~ 7

Author(s):

Joshua Batson ◽

Loïc Royer ◽

James Webber

Keyword(s):

Single Cell ◽

Cross Validation ◽

Individual Cell ◽

Principal Component ◽

Ground Truth ◽

Rna Seq ◽

Optimal Parameters ◽

Denoising Method ◽

Data Driven Approach ◽

Cell Data

Single-cell RNA sequencing enables researchers to study the gene expression of individual cells. However, in high-throughput methods the portrait of each individual cell is noisy, representing thousands of the hundreds of thousands of mRNA molecules originally present. While many methods for denoising single-cell data have been proposed, a principled procedure for selecting and calibrating the best method for a given dataset has been lacking. We present “molecular cross-validation,” a statistically principled and data-driven approach for estimating the accuracy of any denoising method without the need for ground-truth. We validate this approach for three denoising methods—principal component analysis, network diffusion, and a deep autoencoder—on a dataset of deeply-sequenced neurons. We show that molecular cross-validation correctly selects the optimal parameters for each method and identifies the best method for the dataset.

Download Full-text

Implication of specific retinal cell-type involvement and gene expression changes in AMD progression using integrative analysis of single-cell and bulk RNA-seq profiling

Scientific Reports ◽

10.1038/s41598-021-95122-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yafei Lyu ◽

Randy Zauhar ◽

Nicholas Dana ◽

Christianne E. Strang ◽

Jian Hu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Age Related Macular Degeneration ◽

Specific Gene ◽

Cell Type ◽

Adult Human ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Cell Data

AbstractAge‐related macular degeneration (AMD) is a blinding eye disease with no unifying theme for its etiology. We used single-cell RNA sequencing to analyze the transcriptomes of ~ 93,000 cells from the macula and peripheral retina from two adult human donors and bulk RNA sequencing from fifteen adult human donors with and without AMD. Analysis of our single-cell data identified 267 cell-type-specific genes. Comparison of macula and peripheral retinal regions found no cell-type differences but did identify 50 differentially expressed genes (DEGs) with about 1/3 expressed in cones. Integration of our single-cell data with bulk RNA sequencing data from normal and AMD donors showed compositional changes more pronounced in macula in rods, microglia, endothelium, Müller glia, and astrocytes in the transition from normal to advanced AMD. KEGG pathway analysis of our normal vs. advanced AMD eyes identified enrichment in complement and coagulation pathways, antigen presentation, tissue remodeling, and signaling pathways including PI3K-Akt, NOD-like, Toll-like, and Rap1. These results showcase the use of single-cell RNA sequencing to infer cell-type compositional and cell-type-specific gene expression changes in intact bulk tissue and provide a foundation for investigating molecular mechanisms of retinal disease that lead to new therapeutic targets.

Download Full-text

Single-Cell Sequencing Reveals Lineage-Specific Dynamic Genetic Regulation of Gene Expression During Human Cardiomyocyte Differentiation

10.1101/2021.06.03.446970 ◽

2021 ◽

Author(s):

Reem Elorbany ◽

Joshua M Popp ◽

Katherine Rhodes ◽

Benjamin J Strober ◽

Kenneth Barr ◽

...

Keyword(s):

Single Cell ◽

Cell Lines ◽

Specific Gene ◽

Specific Cell ◽

Cardiomyocyte Differentiation ◽

Cell Type ◽

Dynamic Effects ◽

Regulatory Changes ◽

Gene Regulatory ◽

Cell Data

Dynamic and temporally specific gene regulatory changes may underlie unexplained genetic associations with complex disease. During a dynamic process such as cellular differentiation, the overall cell type composition of a tissue (or an in vitro culture) and the gene regulatory profile of each cell can both experience significant changes over time. To identify these dynamic effects in high resolution, we collected single-cell RNA-sequencing data over a differentiation time course from induced pluripotent stem cells to cardiomyocytes, sampled at 7 unique time points in 19 human cell lines. We employed a flexible approach to map dynamic eQTLs whose effects vary significantly over the course of bifurcating differentiation trajectories, including many whose effects are specific to one of these two lineages. Our study design allowed us to distinguish true dynamic eQTLs affecting a specific cell lineage from expression changes driven by potentially non-genetic differences between cell lines such as cell composition. Additionally, we used the cell type profiles learned from single-cell data to deconvolve and re-analyze data from matched bulk RNA-seq samples. Using this approach, we were able to identify a large number of novel dynamic eQTLs in single cell data while also attributing dynamic effects in bulk to a particular lineage. Overall, we found that using single cell data to uncover dynamic eQTLs can provide new insight into the gene regulatory changes that occur among heterogeneous cell types during cardiomyocyte differentiation.

Download Full-text

Sharq, A versatile preprocessing and QC pipeline for Single Cell RNA-seq

10.1101/250811 ◽

2018 ◽

Cited By ~ 3

Author(s):

Tito Candelli ◽

Philip Lijnzaad ◽

Mauro J Muraro ◽

Hindrik Kerstens ◽

Patrick Kemmeren ◽

...

Keyword(s):

Gene Expression ◽

Quality Control ◽

Single Cell ◽

Hierarchical Model ◽

Live Cells ◽

Rna Seq ◽

The Individual ◽

Innovative Approaches

AbstractDespite the meteoric rise of single cell RNA-seq, only a few preprocessing pipelines exist that are able to perform all steps from the original fastq files to a gene expression table ready for further analysis. Here we present Sharq, a versatile preprocessing pipeline designed to work with plate-based 3’-end protocols that include Unique Molecular Identifiers (UMIs). Sharq performs stringent step-wise trimming of reads, assigns them to features according to a flexible hierarchical model, and uses the barcode and UMI information to avoid amplification biases and produce gene expression tables. Additionally, Sharq provides an extensive plate diagnostics report for quality control and troubleshooting, including that of spatial artefacts. The diagnostics report includes measures of the quality of the individual plate wells as well as a robust assessment which of them contain material from live cells. Collectively, the innovative approaches presented here provide a valuable tool for processing and quality control of single cell RNA-seq data.

Download Full-text

scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R

10.1101/069633 ◽

2016 ◽

Cited By ~ 10

Author(s):

Davis J. McCarthy ◽

Kieran R. Campbell ◽

Aaron T. L. Lun ◽

Quin F. Wills

Keyword(s):

Quality Control ◽

Single Cell ◽

Sequence Data ◽

Supplementary Information ◽

Processing Quality ◽

Rna Seq ◽

Study Gene Expression ◽

Supplementary Material ◽

Downstream Analysis ◽

Cell Data

AbstractMotivationSingle-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts, and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalisation.ResultsWe have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalisation and visualisation of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.AvailabilityThe open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater.Supplementary informationSupplementary material is available online at bioRxiv accompanying this manuscript, and all materials required to reproduce the results presented in this paper are available at dx.doi.org/10.5281/zenodo.60139.

Download Full-text

Automated quality control and cell identification of droplet-based single-cell data using dropkick

Genome Research ◽

10.1101/gr.271908.120 ◽

2021 ◽

pp. gr.271908.120

Author(s):

Cody N Heiser ◽

Victoria M Wang ◽

Bob Chen ◽

Jacob J Hughey ◽

Ken S. Lau

Keyword(s):

Quality Control ◽

Single Cell ◽

Cell Identification ◽

Automated Quality Control ◽

Cell Data

Download Full-text

Bayesian estimation of cell type-specific gene expression with prior derived from single-cell data

Genome Research ◽

10.1101/gr.268722.120 ◽

2021 ◽

pp. gr.268722.120

Author(s):

Jiebiao Wang ◽

Kathryn Roeder ◽

Bernie Devlin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Bayesian Estimation ◽

Specific Gene ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Specific ◽

Cell Data

Download Full-text

A United Statistical Framework for Single Cell and Bulk Sequencing Data

10.1101/206532 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Accurate Estimation ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types ◽

Cell Data

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

Download Full-text

SampleQC: robust multivariate, multi-celltype, multi-sample quality control for single cell data

10.1101/2021.08.28.458012 ◽

2021 ◽

Author(s):

Will Macnair ◽

Mark D Robinson

Keyword(s):

Quality Control ◽

Single Cell ◽

Real Data ◽

R Package ◽

Gaussian Mixture ◽

Model Fit ◽

Rna Seq ◽

Industry Standard ◽

Multiple Samples ◽

Cell Data

Quality control (QC) is a critical component of single cell RNA-seq processing pipelines. Many single cell methods assume that scRNA-seq data comprises multiple celltypes that are distinct in terms of gene expression, however this is not reflected in current approaches to QC. We show that the current widely-used methods for QC may have a bias towards exclusion of rarer celltypes, especially those whose QC metrics are more extreme, e.g. those with naturally high mitochondrial proportions. We introduce SampleQC, which improves sensitivity and reduces bias relative to current industry standard approaches, via a robust Gaussian mixture model fit across multiple samples simultaneously. We show via simulations that SampleQC is less susceptible than other methods to exclusion of rarer celltypes. We also demonstrate SampleQC on complex real data, comprising up to 867k cells over 172 samples. The framework for SampleQC is general, and has applications as an outlier detection method for data beyond single cell RNA-seq. SampleQC is parallelized and implemented in Rcpp, and is available as an R package.

Download Full-text