scholarly journals scRNABatchQC: multi-samples quality control for single cell RNA-seq data

2019 ◽  
Vol 35 (24) ◽  
pp. 5306-5308
Author(s):  
Qi Liu ◽  
Quanhu Sheng ◽  
Jie Ping ◽  
Marisol Adelina Ramirez ◽  
Ken S Lau ◽  
...  

Abstract Summary Single cell RNA sequencing is a revolutionary technique to characterize inter-cellular transcriptomics heterogeneity. However, the data are noise-prone because gene expression is often driven by both technical artifacts and genuine biological variations. Proper disentanglement of these two effects is critical to prevent spurious results. While several tools exist to detect and remove low-quality cells in one single cell RNA-seq dataset, there is lack of approach to examining consistency between sample sets and detecting systematic biases, batch effects and outliers. We present scRNABatchQC, an R package to compare multiple sample sets simultaneously over numerous technical and biological features, which gives valuable hints to distinguish technical artifact from biological variations. scRNABatchQC helps identify and systematically characterize sources of variability in single cell transcriptome data. The examination of consistency across datasets allows visual detection of biases and outliers. Availability and implementation scRNABatchQC is freely available at https://github.com/liuqivandy/scRNABatchQC as an R package. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Zhun Miao ◽  
Ke Deng ◽  
Xiaowo Wang ◽  
Xuegong Zhang

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5155-5162 ◽  
Author(s):  
Chengzhong Ye ◽  
Terence P Speed ◽  
Agus Salim

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.


GigaScience ◽  
2021 ◽  
Vol 10 (10) ◽  
Author(s):  
Vinay S Swamy ◽  
Temesgen D Fufa ◽  
Robert B Hufnagel ◽  
David M McGaughey

Abstract Background: The development of highly scalable single-cell transcriptome technology has resulted in the creation of thousands of datasets, >30 in the retina alone. Analyzing the transcriptomes between different projects is highly desirable because this would allow for better assessment of which biological effects are consistent across independent studies. However it is difficult to compare and contrast data across different projects because there are substantial batch effects from computational processing, single-cell technology utilized, and the natural biological variation. While many single-cell transcriptome-specific batch correction methods purport to remove the technical noise, it is difficult to ascertain which method functions best. Results: We developed a lightweight R package (scPOP, single-cell Pick Optimal Parameters) that brings in batch integration methods and uses a simple heuristic to balance batch merging and cell type/cluster purity. We use this package along with a Snakefile-based workflow system to demonstrate how to optimally merge 766,615 cells from 33 retina datsets and 3 species to create a massive ocular single-cell transcriptome meta-atlas. Conclusions: This provides a model for how to efficiently create meta-atlases for tissues and cells of interest.


2018 ◽  
Author(s):  
Zhe Sun ◽  
Li Chen ◽  
Hongyi Xin ◽  
Qianhui Huang ◽  
Anthony R Cillo ◽  
...  

AbstractThe recently developed droplet-based single cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we have developed a BAyesiany Mixture Model for Single Cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. Specifically, BAMM-SC takes raw data as input and can account for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from extensive simulations and application of BAMM-SC to in-house scRNA-seq datasets using blood, lung and skin cells from humans or mice demonstrated that BAMM-SC outperformed existing clustering methods with improved clustering accuracy and reduced impact from batch effects. BAMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/~Cwec47/singlecell.html.


2021 ◽  
Author(s):  
Carla A. Gonçalves ◽  
Michael Larsen ◽  
Sascha Jung ◽  
Johannes Stratmann ◽  
Akiko Nakamura ◽  
...  

Abstract Human organogenesis remains relatively unexplored for ethical and practical reasons. Here we report the establishment of a single cell transcriptome atlas of the human fetal pancreas between 7 and 10 post-conceptional weeks of development. To interrogate cell-cell interactions we developed InterCom, an R-Package for identifying receptors-ligand pairs and their downstream effects. We further report the establishment of a human pancreas culture system starting from fetal tissue or human pluripotent stem cells, enabling the long-term maintenance of pancreas progenitors in a minimal, defined medium in three-dimensions. Benchmarking the cells produced in 2D and those expanded in 3D to fetal tissue reveals that progenitors expanded in 3D are transcriptionally closer to the fetal pancreas. We further demonstrate the potential of this system as a screening platform and identify the importance of the EGF and FGF pathways controlling human pancreas progenitor expansion.


2019 ◽  
Author(s):  
Monica Tambalo ◽  
Richard Mitter ◽  
David G. Wilkinson

AbstractSegmentation of the vertebrate hindbrain leads to the formation of rhombomeres, each with a distinct anteroposterior identity. Specialised boundary cells form at segment borders that act as a source or regulator of neuronal differentiation. In zebrafish, there is spatial patterning of neurogenesis in which non-neurogenic zones form at bounderies and segment centres, in part mediated by Fgf20 signaling. To further understand the control of neurogenesis, we have carried out single cell RNA sequencing of the zebrafish hindbrain at three different stages of patterning. Analyses of the data reveal known and novel markers of distinct hindbrain segments, of cell types along the dorsoventral axis, and of the transition of progenitors to neuronal differentiation. We find major shifts in the transcriptome of progenitors and of differentiating cells between the different stages analysed. Supervised clustering with markers of boundary cells and segment centres, together with RNA-seq analysis of Fgf-regulated genes, has revealed new candidate regulators of cell differentiation in the hindbrain. These data provide a valuable resource for functional investigations of the patterning of neurogenesis and the transition of progenitors to neuronal differentiation.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Wei-Wei Lin ◽  
Lin-Tao Xu ◽  
Yi-Sheng Chen ◽  
Ken Go ◽  
Chenyu Sun ◽  
...  

Background. The critical role of vascular health on brain function has received much attention in recent years. At the single-cell level, studies on the developmental processes of cerebral vascular growth are still relatively few. Techniques for constructing gene regulatory networks (GRNs) based on single-cell transcriptome expression data have made significant progress in recent years. Herein, we constructed a single-cell transcriptional regulatory network of mouse cerebrovascular cells. Methods. The single-cell RNA-seq dataset of mouse brain vessels was downloaded from GEO (GSE98816). This cell clustering was annotated separately using singleR and CellMarker. We then used a modified version of the SCENIC method to construct GRNs. Next, we used a mouse version of SEEK to assess whether genes in the regulon were coexpressed. Finally, regulatory module analysis was performed to complete the cell type relationship quantification. Results. Single-cell RNA-seq data were used to analyze the heterogeneity of mouse cerebrovascular cells, whereby four cell types including endothelial cells, fibroblasts, microglia, and oligodendrocytes were defined. These subpopulations of cells and marker genes together characterize the molecular profile of mouse cerebrovascular cells. Through these signatures, key transcriptional regulators that maintain cell identity were identified. Our findings identified genes like Lmo2, which play an important role in endothelial cells. The same cell type, for instance, fibroblasts, was found to have different regulatory networks, which may influence the functional characteristics of local tissues. Conclusions. In this study, a transcriptional regulatory network based on single-cell analysis was constructed. Additionally, the study identified and profiled mouse cerebrovascular cells using single-cell transcriptome data as well as defined TFs that affect the regulatory network of the mouse brain vasculature.


2020 ◽  
Vol 36 (10) ◽  
pp. 3115-3123 ◽  
Author(s):  
Teng Fei ◽  
Tianwei Yu

Abstract Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Giacomo Baruzzo ◽  
Ilaria Patuzzi ◽  
Barbara Di Camillo

Abstract Motivation Single cell RNA-seq (scRNA-seq) count data show many differences compared with bulk RNA-seq count data, making the application of many RNA-seq pre-processing/analysis methods not straightforward or even inappropriate. For this reason, the development of new methods for handling scRNA-seq count data is currently one of the most active research fields in bioinformatics. To help the development of such new methods, the availability of simulated data could play a pivotal role. However, only few scRNA-seq count data simulators are available, often showing poor or not demonstrated similarity with real data. Results In this article we present SPARSim, a scRNA-seq count data simulator based on a Gamma-Multivariate Hypergeometric model. We demonstrate that SPARSim allows to generate count data that resemble real data in terms of count intensity, variability and sparsity, performing comparably or better than one of the most used scRNA-seq simulator, Splat. In particular, SPARSim simulated count matrices well resemble the distribution of zeros across different expression intensities observed in real count data. Availability and implementation SPARSim R package is freely available at http://sysbiobig.dei.unipd.it/? q=SPARSim and at https://gitlab.com/sysbiobig/sparsim. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document