scholarly journals CellBench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods

2019 ◽  
Vol 36 (7) ◽  
pp. 2288-2290 ◽  
Author(s):  
Shian Su ◽  
Luyi Tian ◽  
Xueyi Dong ◽  
Peter F Hickey ◽  
Saskia Freytag ◽  
...  

Abstract Motivation Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. Results The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. Availability and implementation Available from https://bioconductor.org/packages/CellBench.

2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.


2020 ◽  
Author(s):  
Naim Al Mahi ◽  
Erik Y. Zhang ◽  
Susan Sherman ◽  
Jane J. Yu ◽  
Mario Medvedovic

ABSTRACTLymphangioleiomyomatosis (LAM) is a rare pulmonary disease affecting women of childbearing age that is characterized by the aberrant proliferation of smooth-muscle (SM)-like cells and emphysema-like lung remodeling. In LAM, mutations in TSC1 or TSC2 genes results in the activation of the mechanistic target of rapamycin complex 1 (mTORC1) and thus sirolimus, an mTORC1 inhibitor, has been approved by FDA to treat LAM patients. Sirolimus stabilizes lung function and improves symptoms. However, the disease recurs with discontinuation of the drug, potentially because of the sirolimus-induced refractoriness of the LAM cells. Therefore, there is a critical need to identify remission inducing cytocidal treatments for LAM. Recently released Library of Integrated Network-based Cellular Signatures (LINCS) L1000 transcriptional signatures of chemical perturbations has opened new avenues to study cellular responses to existing drugs and new bioactive compounds. Connecting transcriptional signature of a disease to these chemical perturbation signatures to identify bioactive chemicals that can “revert” the disease signatures can lead to novel drug discovery. We developed methods for constructing disease transcriptional signatures and performing connectivity analysis using single cell RNA-seq data. The methods were applied in the analysis of scRNA-seq data of naïve and sirolimus-treated LAM cells. The single cell connectivity analyses implicated mTORC1 inhibitors as capable of reverting the LAM transcriptional signatures while the corresponding standard bulk analysis did not. This indicates the importance of using single cell analysis in constructing disease signatures. The analysis also implicated other classes of drugs, CDK, MEK/MAPK and EGFR/JAK inhibitors, as potential therapeutic agents for LAM.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chunxiang Wang ◽  
Xin Gao ◽  
Juntao Liu

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


2020 ◽  
Vol 48 (W1) ◽  
pp. W403-W414
Author(s):  
Fabrice P A David ◽  
Maria Litovchenko ◽  
Bart Deplancke ◽  
Vincent Gardeux

Abstract Single-cell omics enables researchers to dissect biological systems at a resolution that was unthinkable just 10 years ago. However, this analytical revolution also triggered new demands in ‘big data’ management, forcing researchers to stay up to speed with increasingly complex analytical processes and rapidly evolving methods. To render these processes and approaches more accessible, we developed the web-based, collaborative portal ASAP (Automated Single-cell Analysis Portal). Our primary goal is thereby to democratize single-cell omics data analyses (scRNA-seq and more recently scATAC-seq). By taking advantage of a Docker system to enhance reproducibility, and novel bioinformatics approaches that were recently developed for improving scalability, ASAP meets challenging requirements set by recent cell atlasing efforts such as the Human (HCA) and Fly (FCA) Cell Atlas Projects. Specifically, ASAP can now handle datasets containing millions of cells, integrating intuitive tools that allow researchers to collaborate on the same project synchronously. ASAP tools are versioned, and researchers can create unique access IDs for storing complete analyses that can be reproduced or completed by others. Finally, ASAP does not require any installation and provides a full and modular single-cell RNA-seq analysis pipeline. ASAP is freely available at https://asap.epfl.ch.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.


2019 ◽  
Author(s):  
Gaurav Sharma ◽  
Carlo Colantuoni ◽  
Loyal A Goff ◽  
Elana J Fertig ◽  
Genevieve Stein-O’Brien

AbstractMotivationDimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset.ResultsWe developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.AvailabilityprojectR is available on Bioconductor and at https://github.com/genesofeve/[email protected]; [email protected]


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1947
Author(s):  
Samarendra Das ◽  
Anil Rai ◽  
Michael L. Merchant ◽  
Matthew C. Cave ◽  
Shesh N. Rai

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Greg Holmes ◽  
Ana S. Gonzalez-Reiche ◽  
Madrikha Saturne ◽  
Susan M. Motch Perrine ◽  
Xianxiao Zhou ◽  
...  

AbstractCraniofacial development depends on formation and maintenance of sutures between bones of the skull. In sutures, growth occurs at osteogenic fronts along the edge of each bone, and suture mesenchyme separates adjacent bones. Here, we perform single-cell RNA-seq analysis of the embryonic, wild type murine coronal suture to define its population structure. Seven populations at E16.5 and nine at E18.5 comprise the suture mesenchyme, osteogenic cells, and associated populations. Expression of Hhip, an inhibitor of hedgehog signaling, marks a mesenchymal population distinct from those of other neurocranial sutures. Tracing of the neonatal Hhip-expressing population shows that descendant cells persist in the coronal suture and contribute to calvarial bone growth. In Hhip−/− coronal sutures at E18.5, the osteogenic fronts are closely apposed and the suture mesenchyme is depleted with increased hedgehog signaling compared to those of the wild type. Collectively, these data demonstrate that Hhip is required for normal coronal suture development.


2021 ◽  
Vol 2021 ◽  
pp. 1-21
Author(s):  
Xinbing Liu ◽  
Wei Gao ◽  
Wei Liu

Background. To further understand the development of the spinal cord, an exploration of the patterns and transcriptional features of spinal cord development in newborn mice at the cellular transcriptome level was carried out. Methods. The mouse single-cell sequencing (scRNA-seq) dataset was downloaded from the GSE108788 dataset. Single-cell RNA-Seq (scRNA-Seq) was conducted on cervical and lumbar spinal V2a interneurons from 2 P0 neonates. Single-cell analysis using the Seurat package was completed, and marker mRNAs were identified for each cluster. Then, pseudotemporal analysis was used to analyze the transcription changes of marker mRNAs in different clusters over time. Finally, the functions of these marker mRNAs were assessed by enrichment analysis and protein-protein interaction (PPI) networks. A transcriptional regulatory network was then constructed using the TRRUST dataset. Results. A total of 949 cells were screened. Single-cell analysis was conducted based on marker mRNAs of each cluster, which revealed the heterogeneity of neonatal mouse spinal cord neuronal cells. Functional analysis of pseudotemporal trajectory-related marker mRNAs suggested that pregnancy-specific glycoproteins (PSGs) and carcinoembryonic antigen cell adhesion molecules (CEACAMs) were the core mRNAs in cluster 3. GSVA analysis then demonstrated that the different clusters had differences in pathway activity. By constructing a transcriptional regulatory network, USF2 was identified to be a transcriptional regulator of CEACAM1 and CEACAM5, while KLF6 was identified to be a transcriptional regulator of PSG3 and PSG5. This conclusion was then validated using the Genotype-Tissue Expression (GTEx) spinal cord transcriptome dataset. Conclusions. This study completed an integrated analysis of a single-cell dataset with the utilization of marker mRNAs. USF2/CEACAM1&5 and KLF6/PSG3&5 transcriptional regulatory networks were identified by spinal cord single-cell analysis.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12233
Author(s):  
Diem-Trang Tran ◽  
Matthew Might

Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on ad hoc measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or cdev, to quantify normalization success. cdev measures how much an expression matrix differs from another. If a ground truth normalization is given, cdev can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with cdev, provides a valuable toolset for benchmarking new and existing normalization methods.


Sign in / Sign up

Export Citation Format

Share Document