V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Nathan Lawlor; Eladio J Marquez; Donghyung Lee; Duygu Ucar

doi:10.1093/bioinformatics/btaa128

V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa128 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3582-3584

Author(s):

Nathan Lawlor ◽

Eladio J Marquez ◽

Donghyung Lee ◽

Duygu Ucar

Keyword(s):

Single Cell ◽

Gene Annotation ◽

Supplementary Information ◽

Surrogate Variable Analysis ◽

Batch Correction ◽

Surrogate Variable ◽

R Shiny ◽

Sources Of Variation ◽

Shiny Application ◽

Variable Analysis

Abstract Summary Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including ‘unwanted’ variation that needs to be removed in downstream analyses (e.g. batch effects) and ‘wanted’ or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying ‘wanted’ variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. Availability and implementation The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A statistical framework for the robust detection of hidden variation in single cell transcriptomes

10.1101/151217 ◽

2017 ◽

Author(s):

Donghyung Lee ◽

Anthony Cheng ◽

Mohan Bolisetty ◽

Duygu Ucar

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Marker Genes ◽

Correlated Sources ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Sources Of Variation ◽

Hidden Variation ◽

Variable Analysis

AbstractSingle cell RNA-sequencing (scRNA-seq) precisely characterize gene expression levels and dissect variation in expression associated with the state (technical or biological) and the type of the cell, which is averaged out in bulk measurements. Multiple and correlated sources contribute to gene expression variation in single cells, which makes their estimation difficult with the existing methods developed for bulk measurements (e.g., surrogate variable analysis (SVA)) that estimate orthogonal transformations of these sources. We developed iteratively adjusted surrogate variable analysis (IA-SVA) that can estimate hidden and correlated sources of variation by identifying a set of genes affected with each hidden factor in an iterative manner. Analysis of scRNA-seq data from human cells showed that IA-SVA could accurately capture hidden variation arising from technical (e.g., stacked doublet cells) or biological sources (e.g., cell type or cell-cycle stage). Furthermore, IA-SVA delivers a set of genes associated with the detected hidden source to be used in downstream data analyses. As a proof of concept, IA-SVA recapitulated known marker genes for islet cell subsets (e.g., alpha, beta), which improved the grouping of subsets into distinct clusters. Taken together, IA-SVA is an effective and novel method to dissect multiple and correlated sources of variation in scRNA-seq data.

Download Full-text

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction

Bioinformatics ◽

10.1093/bioinformatics/btu375 ◽

2014 ◽

Vol 30 (19) ◽

pp. 2757-2763 ◽

Cited By ~ 36

Author(s):

Hilary S. Parker ◽

Jeffrey T. Leek ◽

Alexander V. Favorov ◽

Michael Considine ◽

Xiaoxin Xia ◽

...

Keyword(s):

Surrogate Variable Analysis ◽

Batch Correction ◽

Surrogate Variable ◽

Biological Heterogeneity ◽

Variable Analysis

Download Full-text

An improved and explicit surrogate variable analysis procedure by coefficient adjustment

Biometrika ◽

10.1093/biomet/asx018 ◽

2017 ◽

Vol 104 (2) ◽

pp. 303-316 ◽

Cited By ~ 6

Author(s):

Seunggeun Lee ◽

Wei Sun ◽

Fred A. Wright ◽

Fei Zou

Keyword(s):

Analysis Procedure ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Variable Analysis

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text

EnImpute: imputing dropout events in single-cell RNA-sequencing data via ensemble learning

Bioinformatics ◽

10.1093/bioinformatics/btz435 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4827-4829 ◽

Cited By ~ 6

Author(s):

Xiao-Fei Zhang ◽

Le Ou-Yang ◽

Shuo Yang ◽

Xing-Ming Zhao ◽

Xiaohua Hu ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Ensemble Learning ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

The Individual ◽

Downstream Analysis ◽

Shiny Application

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa030 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2932-2933 ◽

Cited By ~ 3

Author(s):

Angela Serra ◽

Laura Aliisa Saarimäki ◽

Michele Fratello ◽

Veer Singh Marwah ◽

Dario Greco

Keyword(s):

Gene Expression ◽

Information Criterion ◽

Benchmark Dose ◽

Functional Enrichment ◽

Supplementary Information ◽

Gene Expression Matrix ◽

R Shiny ◽

Transcriptomics Data ◽

Dose Dependent ◽

Shiny Application

Abstract Motivation The analysis of dose-dependent effects on the gene expression is gaining attention in the field of toxicogenomics. Currently available computational methods are usually limited to specific omics platforms or biological annotations and are able to analyse only one experiment at a time. Results We developed the software BMDx with a graphical user interface for the Benchmark Dose (BMD) analysis of transcriptomics data. We implemented an approach based on the fitting of multiple models and the selection of the optimal model based on the Akaike Information Criterion. The BMDx tool takes as an input a gene expression matrix and a phenotype table, computes the BMD, its related values, and IC50/EC50 estimations. It reports interactive tables and plots that the user can investigate for further details of the fitting, dose effects and functional enrichment. BMDx allows a fast and convenient comparison of the BMD values of a transcriptomics experiment at different time points and an effortless way to interpret the results. Furthermore, BMDx allows to analyse and to compare multiple experiments at once. Availability and implementation BMDx is implemented as an R/Shiny software and is available at https://github.com/Greco-Lab/BMDx/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Peer Review #1 of "Removing batch effects for prediction problems with frozen surrogate variable analysis (v0.1)"

10.7287/peerj.561v0.1/reviews/1 ◽

2014 ◽

Author(s):

E Stone

Keyword(s):

Peer Review ◽

Batch Effects ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Prediction Problems ◽

Variable Analysis

Download Full-text

WASP: a versatile, web-accessible single cell RNA-Seq processing platform

BMC Genomics ◽

10.1186/s12864-021-07469-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Andreas Hoek ◽

Katharina Maibach ◽

Ebru Özmen ◽

Ana Ivonne Vazquez-Armendariz ◽

Jan Philipp Mengel ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Modular Design ◽

Cellular Heterogeneity ◽

Biological Research ◽

Post Processing ◽

Gene Expression Matrix ◽

R Shiny ◽

Initial Processing ◽

Shiny Application

Abstract Background The technology of single cell RNA sequencing (scRNA-seq) has gained massively in popularity as it allows unprecedented insights into cellular heterogeneity as well as identification and characterization of (sub-)cellular populations. Furthermore, scRNA-seq is almost ubiquitously applicable in medical and biological research. However, these new opportunities are accompanied by additional challenges for researchers regarding data analysis, as advanced technical expertise is required in using bioinformatic software. Results Here we present WASP, a software for the processing of Drop-Seq-based scRNA-Seq data. Our software facilitates the initial processing of raw reads generated with the ddSEQ or 10x protocol and generates demultiplexed gene expression matrices including quality metrics. The processing pipeline is realized as a Snakemake workflow, while an R Shiny application is provided for interactive result visualization. WASP supports comprehensive analysis of gene expression matrices, including detection of differentially expressed genes, clustering of cellular populations and interactive graphical visualization of the results. The R Shiny application can be used with gene expression matrices generated by the WASP pipeline, as well as with externally provided data from other sources. Conclusions With WASP we provide an intuitive and easy-to-use tool to process and explore scRNA-seq data. To the best of our knowledge, it is currently the only freely available software package that combines pre- and post-processing of ddSEQ- and 10x-based data. Due to its modular design, it is possible to use any gene expression matrix with WASP’s post-processing R Shiny application. To simplify usage, WASP is provided as a Docker container. Alternatively, pre-processing can be accomplished via Conda, and a standalone version for Windows is available for post-processing, requiring only a web browser.

Download Full-text

Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis

BMC Bioinformatics ◽

10.1186/s12859-015-0808-5 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 32

Author(s):

Andrew E. Jaffe ◽

Thomas Hyde ◽

Joel Kleinman ◽

Daniel R. Weinbergern ◽

Joshua G. Chenoweth ◽

...

Keyword(s):

Data Cleaning ◽

Genomic Data ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Biological Discovery ◽

Variable Analysis

Download Full-text

SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies

Source Code for Biology and Medicine ◽

10.1186/1751-0473-8-8 ◽

2013 ◽

Vol 8 (1) ◽

pp. 8 ◽

Cited By ~ 5

Author(s):

Mehdi Pirooznia ◽

Fayaz Seifuddin ◽

Fernando S Goes ◽

Jeffrey T Leek ◽

Peter P Zandi

Keyword(s):

Gene Expression ◽

Surrogate Variable Analysis ◽

Web Based ◽

Surrogate Variable ◽

Expression Studies ◽

Gene Expression Studies ◽

Variable Analysis

Download Full-text