scholarly journals Fast identification of differential distributions in single-cell RNA-sequencing data with waddR

Author(s):  
Roman Schefzik ◽  
Julian Flesch ◽  
Angela Goncalves

Abstract Motivation Single-cell gene expression distributions measured by single-cell RNA-sequencing (scRNA-seq) often display complex differences between samples. These differences are biologically meaningful but cannot be identified using standard methods for differential expression. Results Here, we derive and implement a flexible and fast differential distribution testing procedure based on the 2-Wasserstein distance. Our method is able to detect any type of difference in distribution between conditions. To interpret distributional differences, we decompose the 2-Wasserstein distance into terms that capture the relative contribution of changes in mean, variance and shape to the overall difference. Finally, we derive mathematical generalisations that allow our method to be used in a broad range of disciplines other than scRNA-seq or bioinformatics. Availability Our methods are implemented in the R/Bioconductor package waddR, which is freely available at https://github.com/goncalves-lab/waddR, along with documentation and examples. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 36 (7) ◽  
pp. 2291-2292 ◽  
Author(s):  
Saskia Freytag ◽  
Ryan Lister

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4827-4829 ◽  
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xing-Ming Zhao ◽  
Xiaohua Hu ◽  
...  

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Abha S Bais ◽  
Dennis Kostka

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study’s conclusions, and therefore computational strategies for the identification of doublets are needed. Results With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. Availability and implementation scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3276-3278 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
James J. Cai

AbstractMotivationThe recent development of single-cell technologies, especially single-cell RNA sequencing (scRNA-seq), provides an unprecedented level of resolution to the cell type heterogeneity. It also enables the study of gene expression variability across individual cells within a homogenous cell population. Feature selection algorithms have been used to select biologically meaningful genes while controlling for sampling noise. An easy-to-use application for feature selection on scRNA-seq data requires integration of functions for data filtering, normalization, visualization, and enrichment analyses. Graphic user interfaces (GUIs) are desired for such an application.ResultsWe used native Matlab and App Designer to develop scGEApp for feature selection on singlecell gene expression data. We specifically designed a new feature selection algorithm based on the 3D spline fitting of expression mean (μ), coefficient of variance (CV), and dropout rate (rdrop), making scGEApp a unique tool for feature selection on scRNA-seq data. Our method can be applied to single-sample or two-sample scRNA-seq data, identify feature genes, e.g., those with unexpectedly high CV for given μ and rdrop of those genes, or genes with the most feature changes. Users can operate scGEApp through GUIs to use the full spectrum of functions including normalization, batch effect correction, imputation, visualization, feature selection, and downstream analyses with GSEA and GOrilla.Availabilityhttps://github.com/jamesjcai/scGEAppContact:[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
James J Cai

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) technology has revolutionized the way research is done in biomedical sciences. It provides an unprecedented level of resolution across individual cells for studying cell heterogeneity and gene expression variability. Analyzing scRNA-seq data is challenging though, due to the sparsity and high dimensionality of the data. Results I developed scGEAToolbox—a Matlab toolbox for scRNA-seq data analysis. It contains a comprehensive set of functions for data normalization, feature selection, batch correction, imputation, cell clustering, trajectory/pseudotime analysis, and network construction, which can be combined and integrated to building custom workflow. While most of the functions are implemented in native Matlab, wrapper functions are provided to allow users to call the “third-party” tools developed in Matlab or other languages. Furthermore, scGEAToolbox is equipped with sophisticated graphical user interfaces (GUIs) generated with App Designer, making it an easy-to-use application for quick data processing. Availability https://github.com/jamesjcai/scGEAToolbox Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii110-ii110
Author(s):  
Christina Jackson ◽  
Christopher Cherry ◽  
Sadhana Bom ◽  
Hao Zhang ◽  
John Choi ◽  
...  

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.


2021 ◽  
Vol 12 (2) ◽  
pp. 317-334
Author(s):  
Omar Alaqeeli ◽  
Li Xing ◽  
Xuekui Zhang

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.


Sign in / Sign up

Export Citation Format

Share Document