schex avoids overplotting for large single-cell RNA-sequencing datasets

2019 ◽  
Vol 36 (7) ◽  
pp. 2291-2292 ◽  
Author(s):  
Saskia Freytag ◽  
Ryan Lister

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (22) ◽  
pp. 4827-4829 ◽  
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xing-Ming Zhao ◽  
Xiaohua Hu ◽  
...  

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Abha S Bais ◽  
Dennis Kostka

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study’s conclusions, and therefore computational strategies for the identification of doublets are needed. Results With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. Availability and implementation scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3276-3278 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2021 ◽  
Author(s):  
Federico Agostinis ◽  
Chiara Romualdi ◽  
Gabriele Sales ◽  
Davide Risso

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Jared Brown ◽  
Zijian Ni ◽  
Chitrasen Mohanty ◽  
Rhonda Bacher ◽  
Christina Kendziorski

AbstractMotivationNormalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers (UMIs) are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression in sequencing depth, but allow the variance and other properties of the gene-specific expression distribution to be non-constant in depth, which often results in reduced power and increased false discoveries in downstream analyses. This problem is exacerbated by the high proportion of zeros present in most datasets.ResultsTo address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing depth, sample heterogeneity, and varying zero proportions, leading to improved performance in downstream analyses in a number of settings.Availability and implementationThe R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/[email protected], [email protected]


Author(s):  
Roman Schefzik ◽  
Julian Flesch ◽  
Angela Goncalves

Abstract Motivation Single-cell gene expression distributions measured by single-cell RNA-sequencing (scRNA-seq) often display complex differences between samples. These differences are biologically meaningful but cannot be identified using standard methods for differential expression. Results Here, we derive and implement a flexible and fast differential distribution testing procedure based on the 2-Wasserstein distance. Our method is able to detect any type of difference in distribution between conditions. To interpret distributional differences, we decompose the 2-Wasserstein distance into terms that capture the relative contribution of changes in mean, variance and shape to the overall difference. Finally, we derive mathematical generalisations that allow our method to be used in a broad range of disciplines other than scRNA-seq or bioinformatics. Availability Our methods are implemented in the R/Bioconductor package waddR, which is freely available at https://github.com/goncalves-lab/waddR, along with documentation and examples. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Helena L. Crowell ◽  
Charlotte Soneson ◽  
Pierre-Luc Germain ◽  
Daniela Calini ◽  
Ludovic Collin ◽  
...  

AbstractSingle-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of individual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single sample or across multiple samples. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis. For example, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition; however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-sample scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.


2019 ◽  
Author(s):  
James J. Cai

AbstractMotivationThe recent development of single-cell technologies, especially single-cell RNA sequencing (scRNA-seq), provides an unprecedented level of resolution to the cell type heterogeneity. It also enables the study of gene expression variability across individual cells within a homogenous cell population. Feature selection algorithms have been used to select biologically meaningful genes while controlling for sampling noise. An easy-to-use application for feature selection on scRNA-seq data requires integration of functions for data filtering, normalization, visualization, and enrichment analyses. Graphic user interfaces (GUIs) are desired for such an application.ResultsWe used native Matlab and App Designer to develop scGEApp for feature selection on singlecell gene expression data. We specifically designed a new feature selection algorithm based on the 3D spline fitting of expression mean (μ), coefficient of variance (CV), and dropout rate (rdrop), making scGEApp a unique tool for feature selection on scRNA-seq data. Our method can be applied to single-sample or two-sample scRNA-seq data, identify feature genes, e.g., those with unexpectedly high CV for given μ and rdrop of those genes, or genes with the most feature changes. Users can operate scGEApp through GUIs to use the full spectrum of functions including normalization, batch effect correction, imputation, visualization, feature selection, and downstream analyses with GSEA and GOrilla.Availabilityhttps://github.com/jamesjcai/scGEAppContact:[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3588-3589 ◽  
Author(s):  
Kaiyi Zhu ◽  
Dimitris Anastassiou

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document