IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis

Bioinformatics ◽

10.1093/bioinformatics/btab108 ◽

2021 ◽

Author(s):

Yuzhou Chang ◽

Carter Allen ◽

Changlin Wan ◽

Dongjun Chung ◽

Chi Zhang ◽

...

Keyword(s):

Single Cell ◽

Gene Networks ◽

Functional Gene ◽

Supplementary Information ◽

Pathway Enrichment Analysis ◽

Specific Cell ◽

Gene Module ◽

Rna Seq ◽

Module Analysis ◽

Interpretation System

Abstract Motivation Single-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of condition-specific functional gene modules (FGM) can help to understand interactive gene networks and complex biological processes in different cell clusters. QUBIC2 is recognized as one of the most efficient and effective biclustering tools for condition-specific FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analysis functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify condition-specific FGMs, predict cell types/clusters, uncover differentially expressed genes, and perform pathway enrichment analysis. It is noteworthy that IRIS-FGM can also take Seurat objects as input, facilitating easy integration with the existing analysis pipeline. Availability and Implementation IRIS-FGM is implemented in the R environment (as of version 3.6) with the source code freely available at https://github.com/BMEngineeR/IRISFGM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis

10.1101/2020.11.04.369108 ◽

2020 ◽

Author(s):

Yuzhou Chang ◽

Carter Allen ◽

Changlin Wan ◽

Dongjun Chung ◽

Chi Zhang ◽

...

Keyword(s):

Single Cell ◽

Enrichment Analysis ◽

Functional Gene ◽

Functional Enrichment ◽

Supplementary Information ◽

Specific Cell ◽

Gene Module ◽

Rna Seq ◽

Module Analysis ◽

Interpretation System

AbstractSummarySingle-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of functional gene modules (FGM) can help to understand gene interactive networks and complex biological processes. QUBIC2 is recognized as one of the most efficient and effective tools for FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analyses functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify co-expressed and co-regulated FGMs, predict cell types/clusters, uncover differentially expressed genes, and perform functional enrichment analysis. It is noteworthy that IRIS-FGM can also takes Seurat objects as input, which facilitate easy integration with existing analysis pipeline.Availability and ImplementationIRIS-FGM is implemented in R environment (as of version 3.6) with the source code freely available at https://github.com/OSU-BMBL/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SMILE: Mutual Information Learning for Integration of Single-cell Omics Data

Bioinformatics ◽

10.1093/bioinformatics/btab706 ◽

2021 ◽

Author(s):

Yang Xu ◽

Priyojit Das ◽

Rachel Patton McCord

Keyword(s):

Deep Learning ◽

Mutual Information ◽

Single Cell ◽

Learning Algorithm ◽

Cellular Systems ◽

Supplementary Information ◽

Omics Data ◽

Learning Approaches ◽

Rna Seq ◽

Integrate Data

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz701 ◽

2019 ◽

Cited By ~ 2

Author(s):

Congting Ye ◽

Qian Zhou ◽

Xiaohui Wu ◽

Chen Yu ◽

Guoli Ji ◽

...

Keyword(s):

Single Cell ◽

Alternative Polyadenylation ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Rna Seq ◽

Computational Tool ◽

Cell Level ◽

Wilcoxon Rank Sum Test ◽

Transcriptional Regulatory ◽

Cell Groups

Abstract Motivation Alternative polyadenylation (APA) plays a key post-transcriptional regulatory role in mRNA stability and functions in eukaryotes. Single cell RNA-seq (scRNA-seq) is a powerful tool to discover cellular heterogeneity at gene expression level. Given 3′ enriched strategy in library construction, the most commonly used scRNA-seq protocol—10× Genomics enables us to improve the study resolution of APA to the single cell level. However, currently there is no computational tool available for investigating APA profiles from scRNA-seq data. Results Here, we present a package scDAPA for detecting and visualizing dynamic APA from scRNA-seq data. Taking bam/sam files and cell cluster labels as inputs, scDAPA detects APA dynamics using a histogram-based method and the Wilcoxon rank-sum test, and visualizes candidate genes with dynamic APA. Benchmarking results demonstrated that scDAPA can effectively identify genes with dynamic APA among different cell groups from scRNA-seq data. Availability and implementation The scDAPA package is implemented in Shell and R, and is freely available at https://scdapa.sourceforge.io. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz801 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1779-1784 ◽

Cited By ~ 1

Author(s):

Chuanqi Wang ◽

Jun Li

Keyword(s):

Neural Network ◽

Single Cell ◽

Count Data ◽

Deep Neural Network ◽

Sequencing Depth ◽

Supplementary Information ◽

Neural Network Classifier ◽

Rna Seq ◽

Scale Invariant ◽

Downstream Analysis

Abstract Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scDoc: correcting drop-out events in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa283 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4233-4239

Author(s):

Di Ran ◽

Shanshan Zhang ◽

Nicholas Lytal ◽

Lingling An

Keyword(s):

Single Cell ◽

Simulated Data ◽

Drop Out ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Cell Subpopulation ◽

Rna Seq ◽

Imputation Methods ◽

Similarity Estimation ◽

Differential Expression Detection

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of ‘drop-out’ events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this article, we present a novel single-cell RNA-seq drop-out correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. Results scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification and differential expression detection in scRNA-seq data. Availability and implementation R code is available at https://github.com/anlingUA/scDoc. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CellWalker integrates single-cell and bulk data to resolve regulatory elements across cell types in complex tissues

10.1101/847657 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pawel F. Przytycki ◽

Katherine S. Pollard

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cell Labeling ◽

Open Chromatin ◽

Specific Cell ◽

Rna Seq ◽

Data Types ◽

Bulk Data ◽

Cell Type Specific

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell-type specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve enhancers to specific cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their enhancers.

Download Full-text

Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing

10.1101/2020.07.29.227546 ◽

2020 ◽

Author(s):

Ruoyu Zhang ◽

Gurinder S. Atwal ◽

Wei Keat Lim

Keyword(s):

Single Cell ◽

Gene Networks ◽

Immune Cell ◽

Data Preprocessing ◽

Rna Seq ◽

Imputation Methods ◽

Individual Gene ◽

Gene Associations ◽

Cell Modules ◽

Noise Regularization

AbstractWith the rapid advancement of single-cell RNA-seq (scRNA-seq) technology, many data preprocessing methods have been proposed to address numerous systematic errors and technical variabilities inherent in this technology. While these methods have been demonstrated to be effective in recovering individual gene expression, the suitability to the inference of gene-gene associations and subsequent gene networks reconstruction have not been systemically investigated. In this study, we benchmarked five representative scRNA-seq normalization/imputation methods on human cell atlas bone marrow data with respect to their impact on inferred gene-gene associations. Our results suggested that a considerable amount of spurious correlations was introduced during the data preprocessing steps due to over-smoothing of the raw data. We proposed a model-agnostic noise regularization method that can effectively eliminate the correlation artifacts. The noise regularized gene-gene correlations were further used to reconstruct gene co-expression network and successfully revealed several known immune cell modules.

Download Full-text