scRMD: Imputation for single cell RNA-seq data via robust matrix decomposition

Mapping Intimacies ◽

10.1101/459404 ◽

2018 ◽

Cited By ~ 3

Author(s):

Chong Chen ◽

Changjing Wu ◽

Linjie Wu ◽

Yishu Wang ◽

Minghua Deng ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Transcriptome Profiling ◽

Downstream Analysis ◽

Whole Transcriptome ◽

Biological And Medical Applications

AbstractMotivationSingle cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant noise increase, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values thus becomes an essential step in scRNA-seq data analysis.ResultsIn this paper, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering [email protected]

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition

Bioinformatics ◽

10.1093/bioinformatics/btaa139 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3156-3161 ◽

Cited By ~ 9

Author(s):

Chong Chen ◽

Changjing Wu ◽

Linjie Wu ◽

Xiaochen Wang ◽

Minghua Deng ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Transcriptome Profiling ◽

R Package ◽

Supplementary Information ◽

Downstream Analysis

Abstract Motivation Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. Results In this article, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. Availability and implementation The R package scRMD is available at https://github.com/XiDsLab/scRMD. Supplementary information Supplementary data are available at Bioinformatics online.

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Integrative Differential Expression Analysis for Multiple EXperiments (IDEAMEX): A Web Server Tool for Integrated RNA-Seq Data Analysis

Frontiers in Genetics ◽

10.3389/fgene.2019.00279 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 7

Author(s):

Verónica Jiménez-Jacinto ◽

Alejandro Sanchez-Flores ◽

Leticia Vega-Alvarado

Keyword(s):

Data Analysis ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Web Server ◽

Rna Seq

Two-phase differential expression analysis for single cell RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/bty329 ◽

2018 ◽

Vol 34 (19) ◽

pp. 3340-3348 ◽

Cited By ~ 11

Author(s):

Zhijin Wu ◽

Yi Zhang ◽

Michael L Stitzel ◽

Hao Wu

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Two Phase

UMI-count modeling and differential expression analysis for single-cell RNA sequencing

Genome Biology ◽

10.1186/s13059-018-1438-9 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 31

Author(s):

Wenan Chen ◽

Yan Li ◽

John Easton ◽

David Finkelstein ◽

Gang Wu ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Cell Rna Sequencing

A guide to creating design matrices for gene expression experiments

F1000Research ◽

10.12688/f1000research.27893.1 ◽

2020 ◽

Vol 9 ◽

pp. 1444

Author(s):

Charity W. Law ◽

Kathleen Zeglinski ◽

Xueyi Dong ◽

Monther Alhamdoosh ◽

Gordon K. Smyth ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Graphical Representation ◽

Differential Expression Analysis ◽

Data Types ◽

Software Packages ◽

Set Up

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.

Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

10.1101/143289 ◽

2017 ◽

Cited By ~ 16

Author(s):

Charlotte Soneson ◽

Mark D. Robinson

Keyword(s):

Single Cell ◽

Differential Expression ◽

Statistical Methods ◽

Expression Analysis ◽

Method Development ◽

Differential Expression Analysis ◽

Data Sets ◽

Rna Seq ◽

Data Set ◽

Extensive Evaluation

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.

RNA-Seq Data Analysis: From Raw Data Quality Control to Differential Expression Analysis

Methods in Molecular Biology - Plant Germline Development ◽

10.1007/978-1-4939-7286-9_23 ◽

2017 ◽

pp. 295-307 ◽

Cited By ~ 3

Author(s):

Weihong Qi ◽

Ralph Schlapbach ◽

Hubert Rehrauer

Keyword(s):

Quality Control ◽

Data Analysis ◽

Data Quality ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Data Quality Control ◽

Raw Data

ideal: an R/Bioconductor package for Interactive Differential Expression Analysis

10.1101/2020.01.10.901652 ◽

2020 ◽

Cited By ~ 4

Author(s):

Federico Marini ◽

Jan Linke ◽

Harald Binder

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Web Application ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

Data Interpretation ◽

R Package ◽

Rna Seq ◽

Fully Integrated ◽

Bioconductor Project

AbstractBackgroundRNA sequencing (RNA-seq) is an ever increasingly popular tool for transcriptome profiling. A key point to make the best use of the available data is to provide software tools that are easy to use but still provide flexibility and transparency in the adopted methods. Despite the availability of many packages focused on detecting differential expression, a method to streamline this type of bioinformatics analysis in a comprehensive, accessible, and reproducible way is lacking.ResultsWe developed the ideal software package, which serves as a web application for interactive and reproducible RNA-seq analysis, while producing a wealth of visualizations to facilitate data interpretation. ideal is implemented in R using the Shiny framework, and is fully integrated with the existing core structures of the Bioconductor project. Users can perform the essential steps of the differential expression analysis work-flow in an assisted way, and generate a broad spectrum of publication-ready outputs, including diagnostic and summary visualizations in each module, all the way down to functional analysis. ideal also offers the possibility to seamlessly generate a full HTML report for storing and sharing results together with code for reproducibility.Conclusionideal is distributed as an R package in the Bioconductor project (http://bioconductor.org/packages/ideal/), and provides a solution for performing interactive and reproducible analyses of summarized RNA-seq expression data, empowering researchers with many different profiles (life scientists, clinicians, but also experienced bioinformaticians) to make the ideal use of the data at hand.