scholarly journals Sincell: Bioconductor package for the statistical assessment of cell-state hierarchies from single-cell RNA-seq data

2015 ◽  
Author(s):  
Miguel Juliá ◽  
Amalio Telenti ◽  
Antonio Rausell

Summary: Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general framework composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algo-rithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies. Sincell functionalities are illustrated in a real case study where its ability to discriminate noisy from stable cell-state hierarchies is demonstrated. Availability and implementation: Sincell is an open-source R/Bioconductor package available at http://bioconductor.org/packages/3.1/bioc/html/sincell.html. A detailed vignette describing functions and workflows is provided with the package.

Author(s):  
Massimo Andreatta ◽  
Santiago J. Carmona

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo


2021 ◽  
Author(s):  
Federico Agostinis ◽  
Chiara Romualdi ◽  
Gabriele Sales ◽  
Davide Risso

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Will Macnair ◽  
Mark D Robinson

Quality control (QC) is a critical component of single cell RNA-seq processing pipelines. Many single cell methods assume that scRNA-seq data comprises multiple celltypes that are distinct in terms of gene expression, however this is not reflected in current approaches to QC. We show that the current widely-used methods for QC may have a bias towards exclusion of rarer celltypes, especially those whose QC metrics are more extreme, e.g. those with naturally high mitochondrial proportions. We introduce SampleQC, which improves sensitivity and reduces bias relative to current industry standard approaches, via a robust Gaussian mixture model fit across multiple samples simultaneously. We show via simulations that SampleQC is less susceptible than other methods to exclusion of rarer celltypes. We also demonstrate SampleQC on complex real data, comprising up to 867k cells over 172 samples. The framework for SampleQC is general, and has applications as an outlier detection method for data beyond single cell RNA-seq. SampleQC is parallelized and implemented in Rcpp, and is available as an R package.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.


2017 ◽  
Author(s):  
Zhun Miao ◽  
Ke Deng ◽  
Xiaowo Wang ◽  
Xuegong Zhang

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.


2020 ◽  
Author(s):  
Jinjin Tian ◽  
Jiebiao Wang ◽  
Kathryn Roeder

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]


2019 ◽  
Vol 35 (24) ◽  
pp. 5155-5162 ◽  
Author(s):  
Chengzhong Ye ◽  
Terence P Speed ◽  
Agus Salim

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Massimo Andreatta ◽  
Santiago J Carmona

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.


Sign in / Sign up

Export Citation Format

Share Document