svaseq: removing batch effects and other unwanted noise from sequencing data

Mapping Intimacies ◽

10.1101/006585 ◽

2014 ◽

Cited By ~ 11

Author(s):

Jeffrey Leek

Keyword(s):

Count Data ◽

Genomic Data ◽

Data Matrix ◽

Bioconductor Package ◽

Batch Effects ◽

Sequencing Data ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Adjustment Factors ◽

Variable Analysis

It is now well known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. We introduced surrogate variable analysis for estimating these artifacts by (1) identifying the part of the genomic data only affected by artifacts and (2) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors. Here I describe an update to the sva approach that can be applied to analyze count data or FPKMs from sequencing experiments. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. These updates are available through the surrogate variable analysis (sva) Bioconductor package.

Download Full-text

Peer Review #1 of "Removing batch effects for prediction problems with frozen surrogate variable analysis (v0.1)"

10.7287/peerj.561v0.1/reviews/1 ◽

2014 ◽

Author(s):

E Stone

Keyword(s):

Peer Review ◽

Batch Effects ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Prediction Problems ◽

Variable Analysis

Download Full-text

Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis

BMC Bioinformatics ◽

10.1186/s12859-015-0808-5 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 32

Author(s):

Andrew E. Jaffe ◽

Thomas Hyde ◽

Joel Kleinman ◽

Daniel R. Weinbergern ◽

Joshua G. Chenoweth ◽

...

Keyword(s):

Data Cleaning ◽

Genomic Data ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Biological Discovery ◽

Variable Analysis

Download Full-text

Peer Review #2 of "Removing batch effects for prediction problems with frozen surrogate variable analysis (v0.1)"

10.7287/peerj.561v0.1/reviews/2 ◽

2014 ◽

Author(s):

M Chikikina

Keyword(s):

Peer Review ◽

Batch Effects ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Prediction Problems ◽

Variable Analysis

Download Full-text

Removing batch effects for prediction problems with frozen surrogate variable analysis

PeerJ ◽

10.7717/peerj.561 ◽

2014 ◽

Vol 2 ◽

pp. e561 ◽

Cited By ~ 28

Author(s):

Hilary S. Parker ◽

Héctor Corrada Bravo ◽

Jeffrey T. Leek

Keyword(s):

Batch Effects ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Prediction Problems ◽

Variable Analysis

Download Full-text

Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis

BMC Bioinformatics ◽

10.1186/s12859-016-1152-0 ◽

2016 ◽

Vol 17 (1) ◽

Author(s):

Andrew E. Jaffe ◽

Thomas Hyde ◽

Joel Kleinman ◽

Daniel R. Weinberger ◽

Joshua G. Chenoweth ◽

...

Keyword(s):

Data Cleaning ◽

Genomic Data ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Biological Discovery ◽

Variable Analysis

Download Full-text

An improved and explicit surrogate variable analysis procedure by coefficient adjustment

Biometrika ◽

10.1093/biomet/asx018 ◽

2017 ◽

Vol 104 (2) ◽

pp. 303-316 ◽

Cited By ~ 6

Author(s):

Seunggeun Lee ◽

Wei Sun ◽

Fred A. Wright ◽

Fei Zou

Keyword(s):

Analysis Procedure ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Variable Analysis

Download Full-text

ComBat-seq: batch effect adjustment for RNA-seq count data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa078 ◽

2020 ◽

Vol 2 (3) ◽

Cited By ~ 2

Author(s):

Yuqing Zhang ◽

Giovanni Parmigiani ◽

W Evan Johnson

Keyword(s):

Differential Expression ◽

Count Data ◽

Statistical Power ◽

Negative Binomial ◽

Genomic Data ◽

Negative Binomial Regression ◽

Negative Binomial Regression Model ◽

Rna Seq ◽

Batch Effects ◽

Binomial Regression

Abstract The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.

Download Full-text

V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa128 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3582-3584

Author(s):

Nathan Lawlor ◽

Eladio J Marquez ◽

Donghyung Lee ◽

Duygu Ucar

Keyword(s):

Single Cell ◽

Gene Annotation ◽

Supplementary Information ◽

Surrogate Variable Analysis ◽

Batch Correction ◽

Surrogate Variable ◽

R Shiny ◽

Sources Of Variation ◽

Shiny Application ◽

Variable Analysis

Abstract Summary Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including ‘unwanted’ variation that needs to be removed in downstream analyses (e.g. batch effects) and ‘wanted’ or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying ‘wanted’ variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. Availability and implementation The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies

Source Code for Biology and Medicine ◽

10.1186/1751-0473-8-8 ◽

2013 ◽

Vol 8 (1) ◽

pp. 8 ◽

Cited By ~ 5

Author(s):

Mehdi Pirooznia ◽

Fayaz Seifuddin ◽

Fernando S Goes ◽

Jeffrey T Leek ◽

Peter P Zandi

Keyword(s):

Gene Expression ◽

Surrogate Variable Analysis ◽

Web Based ◽

Surrogate Variable ◽

Expression Studies ◽

Gene Expression Studies ◽

Variable Analysis

Download Full-text

Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis

PLoS Genetics ◽

10.1371/journal.pgen.0030161 ◽

2007 ◽

Vol 3 (9) ◽

pp. e161 ◽

Cited By ~ 884

Author(s):

Jeffrey T Leek ◽

John D Storey

Keyword(s):

Gene Expression ◽

Surrogate Variable Analysis ◽

Surrogate Variable ◽

Expression Studies ◽

Gene Expression Studies ◽

Variable Analysis

Download Full-text