Discrete Distributional Differential Expression (D3E) - A Tool for Gene Expression Analysis of Single-cell RNA-seq Data

The advent of high throughput RNA-seq at the single-cell level has opened up new opportunities to elucidate the heterogeneity of gene expression. One of the most widespread applications of RNA-seq is to identify genes which are differentially expressed (DE) between two experimental conditions. Here, we present a discrete, distributional method for differential gene expression (D3E), a novel algorithm specifically designed for single-cell RNA-seq data. We use synthetic data to evaluate D3E, demonstrating that it can detect changes in expression, even when the mean level remains unchanged. D3E is based on an analytically tractable stochastic model, and thus it provides additional biological insights by quantifying biologically meaningful properties, such as the average burst size and frequency. We use D3E to investigate experimental data, and with the help of the underlying model, we directly test hypotheses about the driving mechanism behind changes in gene expression.

Download Full-text

Design and power analysis for multi-sample single cell genomics experiments

10.1101/2020.04.01.019851 ◽

2020 ◽

Cited By ~ 2

Author(s):

Katharina T. Schmid ◽

Cristiana Cruceanu ◽

Anika Böttcher ◽

Heiko Lickert ◽

Elisabeth B. Binder ◽

...

Keyword(s):

Gene Expression ◽

Experimental Design ◽

Single Cell ◽

Expression Analysis ◽

Quantitative Trait ◽

Power Analysis ◽

Rna Seq ◽

Wide Range ◽

Differential Gene ◽

Number Of Cells

AbstractBackgroundThe identification of genes associated with specific experimental conditions, genotypes or phenotypes through differential expression analysis has long been the cornerstone of transcriptomic analysis. Single cell RNA-seq is revolutionizing transcriptomics and is enabling interindividual differential gene expression analysis and identification of genetic variants associated with gene expression, so called expression quantitative trait loci at cell-type resolution. Current methods for power analysis and guidance of experimental design either do not account for the specific characteristics of single cell data or are not suitable to model interindividual comparisons.ResultsHere we present a statistical framework for experimental design and power analysis of single cell differential gene expression between groups of individuals and expression quantitative trait locus analysis. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes within individual cell types. Power analysis is based on data driven priors from literature or pilot experiments across a wide range of application scenarios and single cell RNA-seq platforms. Using these priors we show that, for a fixed budget, the number of cells per individual is the major determinant of power.ConclusionOur model is general and allows for systematic comparison of alternative experimental designs and can thus be used to guide experimental design to optimize power. For a wide range of applications, shallow sequencing of high numbers of cells per individual leads to higher overall power than deep sequencing of fewer cells. The model is implemented as an R package scPower.

Download Full-text

A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

10.1101/2021.07.10.451910 ◽

2021 ◽

Author(s):

Wenpin Hou ◽

Zhicheng Ji ◽

Zeyu Chen ◽

E John Wherry ◽

Stephanie C Hicks ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Biological Processes ◽

Rna Seq ◽

Experimental Conditions ◽

Computational Framework ◽

Statistical Framework ◽

Gene Regulatory ◽

Multiple Samples ◽

False Discoveries

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns with multiple samples (or replicates) across different experimental conditions are lacking. Lamian is a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. It can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions, and also to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for cross-sample variability and hence substantially reduces sample-specific false discoveries that are not generalizable to new samples. Using both simulations and real scRNA-seq data, including an analysis of differential immune response programs between COVID-19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.

Download Full-text

scDesign2: an interpretable simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

10.1101/2020.11.17.387795 ◽

2020 ◽

Author(s):

Tianyi Sun ◽

Dongyuan Song ◽

Wei Vivian Li ◽

Jingyi Jessica Li

Keyword(s):

Gene Expression ◽

Single Cell ◽

Computational Methods ◽

Probabilistic Models ◽

Synthetic Data ◽

Cell Number ◽

High Fidelity ◽

Rna Seq ◽

Cell Gene Expression ◽

Cell Gene

AbstractIn the burgeoning field of single-cell transcriptomics, a pressing challenge is to benchmark various experimental protocols and numerous computational methods in an unbiased manner. Although dozens of simulators have been developed for single-cell RNA-seq (scRNA-seq) data, they lack the capacity to simultaneously achieve all the three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, here we propose scDesign2, an interpretable simulator that achieves all the three goals and generates high-fidelity synthetic data for multiple scRNA-seq protocols and other single-cell gene expression count-based technologies. Compared with existing simulators, scDesign2 is advantageous in its transparent use of probabilistic models and is unique in its ability to capture gene correlations via copula. We verify that scDesign2 generates more realistic synthetic data for four scRNA-seq protocols (10x Genomics, CEL-Seq2, Fluidigm C1, and Smart-Seq2) and two single-cell spatial transcriptomics protocols (MERFISH and pciSeq) than existing simulators do. Under two typical computational tasks, cell clustering and rare cell type detection, we demonstrate that scDesign2 provides informative guidance on deciding the optimal sequencing depth and cell number in single-cell RNA-seq experimental design, and that scDesign2 can effectively benchmark computational methods under varying sequencing depths and cell numbers. With these advantages, scDesign2 is a powerful tool for single-cell researchers to design experiments, develop computational methods, and choose appropriate methods for specific data analysis needs.

Download Full-text

Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations

Science Advances ◽

10.1126/sciadv.aav2249 ◽

2019 ◽

Vol 5 (5) ◽

pp. eaav2249 ◽

Cited By ~ 22

Author(s):

Dongju Shin ◽

Wookjae Lee ◽

Ji Hyun Lee ◽

Duhee Bang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cost Effective ◽

Specific Gene ◽

Rna Seq ◽

Experimental Conditions ◽

Cost Effective Method ◽

Treatment Experiment ◽

Single Cell Profiling ◽

Multiple Samples

The development of high-throughput single-cell RNA sequencing (scRNA-seq) has enabled access to information about gene expression in individual cells and insights into new biological areas. Although the interest in scRNA-seq has rapidly grown in recent years, the existing methods are plagued by many challenges when performing scRNA-seq on multiple samples. To simultaneously analyze multiple samples with scRNA-seq, we developed a universal sample barcoding method through transient transfection with short barcode oligonucleotides. By conducting a species-mixing experiment, we have validated the accuracy of our method and confirmed the ability to identify multiplets and negatives. Samples from a 48-plex drug treatment experiment were pooled and analyzed by a single run of Drop-Seq. This revealed unique transcriptome responses for each drug and target-specific gene expression signatures at the single-cell level. Our cost-effective method is widely applicable for the single-cell profiling of multiple experimental conditions, enabling the widespread adoption of scRNA-seq for various applications.

Download Full-text

Alignment of time-course single-cell RNA-seq data with CAPITAL

10.1101/859751 ◽

2019 ◽

Author(s):

Reiichi Sugihara ◽

Yuki Kato ◽

Tomoya Mori ◽

Yukio Kawahara

Keyword(s):

Gene Expression ◽

Single Cell ◽

Time Course ◽

Rna Seq ◽

Experimental Conditions ◽

Tree Alignment ◽

Public Data ◽

Gene Expression Dynamics ◽

Time Course Data ◽

Cell Trajectory

AbstractRecent techniques on single-cell RNA sequencing have boosted transcriptome-wide observation of gene expression dynamics of time-course data at a single-cell scale. Typical examples of such analysis include inference of a pseudotime cell trajectory, and comparison of pseudotime trajectories between different experimental conditions will tell us how feature genes regulate a dynamic cellular process. Existing methods for comparing pseudotime trajectories, however, force users to select trajectories to be compared because they can deal only with simple linear trajectories, leading to the possibility of making a biased interpretation. Here we present CAPITAL, a method for comparing pseudotime trajectories with tree alignment whereby trajectories including branching can be compared without any knowledge of paths to be compared. Computational tests on time-series public data indicate that CAPITAL can align non-linear pseudotime trajectories and reveal gene expression dynamics.

Download Full-text

Dashboard-style interactive plots for RNA-seq analysis are R Markdown ready with Glimma 2.0

10.1101/2021.07.30.454464 ◽

2021 ◽

Author(s):

Has Kariyawasam ◽

Shian Su ◽

Oliver Voogd ◽

Matthew E Ritchie ◽

Charity W Law

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Interactive Graphics ◽

Rna Seq ◽

Experimental Conditions ◽

Differential Gene Expression Analysis ◽

Differential Gene ◽

High Level ◽

User Friendly

Glimma 1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to Glimma which brings improved interactivity and reproducibility using high-level visualisation frameworks for R and JavaScript. Glimma 2.0 plots are now readily embeddable in R Markdown, thus allowing users to create reproducible reports containing interactive graphics. The revamped multidimensional scaling plot features dashboard-style controls allowing the user to dynamically change the colour, shape and size of sample points according to different experimental conditions. Interactivity was enhanced in the MA-style plot for comparing differences to average expression, which now supports selecting multiple genes, export options to PNG, SVG or CSV formats and includes a new volcano plot function. Feature-rich and user-friendly, Glimma makes exploring data for gene expression analysis more accessible and intuitive and is available on Bioconductor and GitHub.

Download Full-text

Dashboard-style interactive plots for RNA-seq analysis are R Markdown ready with Glimma 2.0

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab116 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Hasaru Kariyawasam ◽

Shian Su ◽

Oliver Voogd ◽

Matthew E Ritchie ◽

Charity W Law

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Interactive Graphics ◽

Rna Seq ◽

Experimental Conditions ◽

Differential Gene Expression Analysis ◽

Differential Gene ◽

High Level ◽

User Friendly

Abstract Glimma 1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to Glimma that brings improved interactivity and reproducibility using high-level visualization frameworks for R and JavaScript. Glimma 2.0 plots are now readily embeddable in R Markdown, thus allowing users to create reproducible reports containing interactive graphics. The revamped multidimensional scaling plot features dashboard-style controls allowing the user to dynamically change the colour, shape and size of sample points according to different experimental conditions. Interactivity was enhanced in the MA-style plot for comparing differences to average expression, which now supports selecting multiple genes, export options to PNG, SVG or CSV formats and includes a new volcano plot function. Feature-rich and user-friendly, Glimma makes exploring data for gene expression analysis more accessible and intuitive and is available on Bioconductor and GitHub.

Download Full-text

Faculty Opinions recommendation of Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717971189.793469500 ◽

2013 ◽

Author(s):

Stephen Turner

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Rna Seq ◽

Web Tool ◽

Differential Gene

Download Full-text

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Genome Biology ◽

10.1186/s13059-021-02368-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ruizhu Huang ◽

Charlotte Soneson ◽

Pierre-Luc Germain ◽

Thomas S.B. Schmidt ◽

Christian Von Mering ◽

...

Keyword(s):

Single Cell ◽

Synthetic Data ◽

Cell Types ◽

Data Driven ◽

Rna Seq ◽

Hierarchical Trees

AbstracttreeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Download Full-text