scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R

AbstractMotivationSingle-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts, and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalisation.ResultsWe have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalisation and visualisation of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.AvailabilityThe open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater.Supplementary informationSupplementary material is available online at bioRxiv accompanying this manuscript, and all materials required to reproduce the results presented in this paper are available at dx.doi.org/10.5281/zenodo.60139.

Download Full-text

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz801 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1779-1784 ◽

Cited By ~ 1

Author(s):

Chuanqi Wang ◽

Jun Li

Keyword(s):

Neural Network ◽

Single Cell ◽

Count Data ◽

Deep Neural Network ◽

Sequencing Depth ◽

Supplementary Information ◽

Neural Network Classifier ◽

Rna Seq ◽

Scale Invariant ◽

Downstream Analysis

Abstract Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R

Bioinformatics ◽

10.1093/bioinformatics/btw777 ◽

2017 ◽

pp. btw777 ◽

Cited By ~ 240

Author(s):

Davis J. McCarthy ◽

Kieran R. Campbell ◽

Aaron T. L. Lun ◽

Quin F. Wills

Keyword(s):

Quality Control ◽

Single Cell ◽

Processing Quality ◽

Rna Seq

Download Full-text

SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning

10.1101/118901 ◽

2017 ◽

Cited By ~ 9

Author(s):

Bo Wang ◽

Daniele Ramazzotti ◽

Luca De Sano ◽

Junjie Zhu ◽

Emma Pierson ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cell Analysis ◽

R Package ◽

Supplementary Information ◽

Cell Analysis ◽

Rna Seq ◽

A Cell ◽

Supplementary Material ◽

Public Datasets

AbstractMotivationWe here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.Availability and ImplementationSIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on [email protected] or [email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

sc-REnF: An Entropy Guided Robust Feature Selection for Single-Cell RNA-seq Data

10.21203/rs.3.rs-355014/v1 ◽

2021 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Small Sample ◽

Rna Seq ◽

Homogeneous Grouping ◽

Cell Clustering ◽

Selection For ◽

Downstream Analysis ◽

Cell Data

Abstract Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF, (robust entropy based feature (gene) selection method), aiming to leverage the advantages of Rényi and Tsallis> entropies in gene selection for single cell clustering. Experiments demonstrate that with tuned parameter (q), Rényi and Tsallis entropies select genes that improved the clustering results significantly, over the other competing methods. sc-REnF can capture relevancy and redundancy among the features of noisy data extremely well due to its robust objective function. Moreover, the selected features/genes can able to clusters the unknown cells with a high accuracy. Finally, sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data.

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

Bioinformatics ◽

10.1093/bioinformatics/btz778 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul D W Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Rna Seq ◽

Clustering Methods ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

Abstract Motivation Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. Results The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation.We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. Availability An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. Supplementary Information Supplementary data are available at Bioinformatics online.

Download Full-text

Optimal Gene Filtering for Single-Cell data (OGFSC)—a gene filtering algorithm for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/bty1016 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2602-2609 ◽

Cited By ~ 3

Author(s):

Jie Hao ◽

Wei Cao ◽

Jian Huang ◽

Xin Zou ◽

Ze-Guang Han

Keyword(s):

Single Cell ◽

Supplementary Information ◽

Rna Seq ◽

Aging Research ◽

Technical Noise ◽

Transcriptomic Data ◽

Knowledge Based ◽

Gene Filtering ◽

Cell Data ◽

Gene Expression Levels

Abstract Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, e.g. by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as Optimal Gene Filtering for Single-Cell data, to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability and implementation https://github.com/XZouProjects/OGFSC.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

snakePipes: facilitating flexible, scalable and integrative epigenomic analysis

Bioinformatics ◽

10.1093/bioinformatics/btz436 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4757-4759 ◽

Cited By ~ 18

Author(s):

Vivek Bhardwaj ◽

Steffen Heyne ◽

Katarzyna Sikora ◽

Leily Rabbani ◽

Michael Rauer ◽

...

Keyword(s):

Single Cell ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Rna Seq ◽

Downstream Analysis ◽

Scalable Analysis

Abstract Summary Due to the rapidly increasing scale and diversity of epigenomic data, modular and scalable analysis workflows are of wide interest. Here we present snakePipes, a workflow package for processing and downstream analysis of data from common epigenomic assays: ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq. snakePipes enables users to assemble variants of each workflow and to easily install and upgrade the underlying tools, via its simple command-line wrappers and yaml files. Availability and implementation snakePipes can be installed via conda: `conda install -c mpi-ie -c bioconda -c conda-forge snakePipes’. Source code (https://github.com/maxplanck-ie/snakepipes) and documentation (https://snakepipes.readthedocs.io/en/latest/) are available online. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ascend: R package for analysis of single cell RNA-seq data

10.1101/207704 ◽

2017 ◽

Cited By ~ 11

Author(s):

Anne Senabouth ◽

Samuel W Lukowski ◽

Jose Alquicira Hernandez ◽

Stacey Andersen ◽

Xin Mei ◽

...

Keyword(s):

Single Cell ◽

R Package ◽

Computational Genomics ◽

Supplementary Information ◽

Rna Seq ◽

Software Packages ◽

Wide Range ◽

Flexible Framework ◽

Supplementary Material ◽

Data Objects

AbstractSummaryascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. ascend is designed to work with scRNA-seq data generated by any high-throughput platform, and includes functions to convert data objects between software packages.AvailabilityThe R package and associated vignettes are freely available at https://github.com/IMB-Computational-Genomics-Lab/[email protected] informationAn example dataset is available at ArrayExpress, accession number E-MTAB-6108

Download Full-text

The winning methods for predicting cellular position in the DREAM single cell transcriptomics challenge

10.1101/2020.05.09.086397 ◽

2020 ◽

Author(s):

Vu VH Pham ◽

Xiaomei Li ◽

Buu Truong ◽

Thin Nguyen ◽

Lin Liu ◽

...

Keyword(s):

Single Cell ◽

Web Application ◽

Single Cells ◽

Drosophila Embryo ◽

Supplementary Information ◽

Rna Seq ◽

Link Type ◽

Spatial Reconstruction ◽

Spatial Environment ◽

Supplementary Material

AbstractMotivationPredicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM Challenge on Single Cell Transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single cell transcriptomic data.ResultsWe have developed over 50 pipelines by combining different ways of pre-processing the RNA-seq data, selecting the genes, predicting the cell locations, and validating predicted cell locations, resulting in the winning methods for two out of three sub-challenges in the competition. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web-application to facilitate the research on single cell spatial reconstruction. All the data and the example use cases are available in the Supplementary material.AvailabilityThe scripts of the package are available at https://github.com/thanhbuu04/SCTCwhatateam and the Shiny application is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Briefings in Bioinformatics online.

Download Full-text