DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.

Download Full-text

DIscBIO: a user-friendly pipeline for biomarker discovery in single-cell transcriptomics

10.1101/700989 ◽

2019 ◽

Author(s):

Salim Ghannoum ◽

Waldir Leoncio Netto ◽

Damiano Fantini ◽

Benjamin Ragan-Kelley ◽

Amirabbas Parizadeh ◽

...

Keyword(s):

Single Cell ◽

Biomarker Discovery ◽

Enrichment Analysis ◽

Myxoid Liposarcoma ◽

R Package ◽

Differential Analysis ◽

A Cell ◽

Reproducible Analysis ◽

User Friendly ◽

Cycle Regulation

AbstractThe growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the trasncriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a computational pipeline using Jupyter notebooks. We also provide a user-friendly, cloud version of the notebook for researchers with very limited programming skills. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation datatset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. These notebooks can be used as tutorials for training purposes and will guide researchers to explore their scRNA-seq data.

Download Full-text

scCancer: a package for automated processing of single-cell RNA-seq data in cancer

Briefings in Bioinformatics ◽

10.1093/bib/bbaa127 ◽

2020 ◽

Author(s):

Wenbo Guo ◽

Dongfang Wang ◽

Shicheng Wang ◽

Yiran Shan ◽

Changyi Liu ◽

...

Keyword(s):

Single Cell ◽

Learning Algorithm ◽

R Package ◽

Quality Control Metrics ◽

Sample Data ◽

Automated Processing ◽

Transcriptomic Level ◽

Cellular Phenotypes ◽

User Friendly ◽

Integration Analysis

Abstract Molecular heterogeneities and complex microenvironments bring great challenges for cancer diagnosis and treatment. Recent advances in single-cell RNA-sequencing (scRNA-seq) technology make it possible to study cancer cell heterogeneities and microenvironments at single-cell transcriptomic level. Here, we develop an R package named scCancer, which focuses on processing and analyzing scRNA-seq data for cancer research. Except basic data processing steps, this package takes several special considerations for cancer-specific features. Firstly, the package introduced comprehensive quality control metrics. Secondly, it used a data-driven machine learning algorithm to accurately identify major cancer microenvironment cell populations. Thirdly, it estimated a malignancy score to classify malignant (cancerous) and non-malignant cells. Then, it analyzed intra-tumor heterogeneities by key cellular phenotypes (such as cell cycle and stemness), gene signatures and cell–cell interactions. Besides, it provided multi-sample data integration analysis with different batch-effect correction strategies. Finally, user-friendly graphic reports were generated for all the analyses. By testing on 56 samples with 433 405 cells in total, we demonstrated its good performance. The package is available at: http://lifeome.net/software/sccancer/.

Download Full-text

ESCO: single cell expression simulation incorporating gene co-expression

10.1101/2020.10.20.347211 ◽

2020 ◽

Author(s):

Jinjin Tian ◽

Jiebiao Wang ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

R Package ◽

Brain Cell ◽

Gene Interactions ◽

Cell Type ◽

Imputation Methods ◽

Biological Interest ◽

A Cell ◽

Cell Expression ◽

Cell Data

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]

Download Full-text

MetENP/MetENPWeb: An R package and web application for metabolomics enrichment and pathway analysis in Metabolomics Workbench

10.1101/2020.11.20.391912 ◽

2020 ◽

Author(s):

Kumari Sonal Choudhary ◽

Eoin Fahy ◽

Kevin Coakley ◽

Manish Sud ◽

Mano R Maurya ◽

...

Keyword(s):

Pathway Analysis ◽

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Pathway Enrichment Analysis ◽

Pathway Enrichment ◽

Kegg Pathways ◽

Link Type ◽

Species Specific ◽

User Friendly

ABSTRACTWith the advent of high throughput mass spectrometric methods, metabolomics has emerged as an essential area of research in biomedicine with the potential to provide deep biological insights into normal and diseased functions in physiology. However, to achieve the potential offered by metabolomics measures, there is a need for biologist-friendly integrative analysis tools that can transform data into mechanisms that relate to phenotypes. Here, we describe MetENP, an R package, and a user-friendly web application deployed at the Metabolomics Workbench site extending the metabolomics enrichment analysis to include species-specific pathway analysis, pathway enrichment scores, gene-enzyme information, and enzymatic activities of the significantly altered metabolites. MetENP provides a highly customizable workflow through various user-specified options and includes support for all metabolite species with available KEGG pathways. MetENPweb is a web application for calculating metabolite and pathway enrichment analysis.Availability and ImplementationThe MetENP package is freely available from Metabolomics Workbench GitHub: (https://github.com/metabolomicsworkbench/MetENP), the web application, is freely available at (https://www.metabolomicsworkbench.org/data/analyze.php)

Download Full-text

Comparison of visualization tools for single-cell RNAseq data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa052 ◽

2020 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Batuhan Cakir ◽

Martin Prete ◽

Ni Huang ◽

Stijn van Dongen ◽

Pinar Pir ◽

...

Keyword(s):

Single Cell ◽

R Package ◽

Data Format ◽

Interactive Analysis ◽

Rnaseq Data ◽

Scientific Report ◽

Visualization Tools ◽

Time Required ◽

User Friendly ◽

The Web

Abstract In the last decade, single cell RNAseq (scRNAseq) datasets have grown in size from a single cell to millions of cells. Due to its high dimensionality, it is not always feasible to visualize scRNAseq data and share it in a scientific report or an article publication format. Recently, many interactive analysis and visualization tools have been developed to address this issue and facilitate knowledge transfer in the scientific community. In this study, we review several of the currently available scRNAseq visualization tools and benchmark the subset that allows to visualize the data on the web and share it with others. We consider the memory and time required to prepare datasets for sharing as the number of cells increases, and additionally review the user experience and features available in the web interface. To address the problem of format compatibility we have also developed a user-friendly R package, sceasy, which allows users to convert their own scRNAseq datasets into a specific data format for visualization.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis

10.1101/2020.12.15.422954 ◽

2020 ◽

Author(s):

Xianjun Dong ◽

Xiaoqi Li ◽

Tzuu-Wang Chang ◽

Scott T Weiss ◽

Weiliang Qiu

Keyword(s):

Gene Expression ◽

Sample Size ◽

Single Cell ◽

Allele Frequency ◽

R Package ◽

Power Calculation ◽

Eqtl Analysis ◽

Genome Wide Association Studies ◽

User Friendly ◽

Bulk Tissue

Genome-wide association studies (GWAS) have revealed thousands of genetic loci for common diseases. One of the main challenges in the post-GWAS era is to understand the causality of the genetic variants. Expression quantitative trait locus (eQTL) analysis has been proven to be an effective way to address this question by examining the relationship between gene expression and genetic variation in a sufficiently powered cohort. However, it is often tricky to determine the sample size at which a variant with a specific allele frequency will be detected to associate with gene expression with sufficient power. This is particularly demanding with single-cell RNAseq studies. Therefore, a user-friendly tool to perform power analysis for eQTL at both bulk tissue and single-cell level will be critical. Here, we presented an R package called powerEQTL with flexible functions to calculate power, minimal sample size, or detectable minor allele frequency in both bulk tissue and single-cell eQTL analysis. A user-friendly, program-free web application is also provided, allowing customers to calculate and visualize the parameters interactively.

Download Full-text

dittoSeq: universal user-friendly single-cell and bulk RNA sequencing visualization toolkit

Bioinformatics ◽

10.1093/bioinformatics/btaa1011 ◽

2020 ◽

Author(s):

Daniel G Bunis ◽

Jared Andrews ◽

Gabriela K Fragiadakis ◽

Trevor D Burt ◽

Marina Sirota

Keyword(s):

Single Cell ◽

R Package ◽

Color Blindness ◽

Ease Of Use ◽

Supplementary Information ◽

Supplementary Data ◽

Rnaseq Data ◽

Visualization Toolkit ◽

User Friendly ◽

Publication Quality

Abstract Summary A visualization suite for major forms of bulk and single-cell RNAseq data in R. dittoSeq is color blindness-friendly by default, robustly documented to power ease-of-use and allows highly customizable generation of both daily-use and publication-quality figures. Availability and implementation dittoSeq is an R package available through Bioconductor via an open source MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GAPGOM—an R package for gene annotation prediction using GO Metrics

BMC Research Notes ◽

10.1186/s13104-021-05580-1 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Casper van Mourik ◽

Rezvan Ehsani ◽

Finn Drabløs

Keyword(s):

Gene Ontology ◽

Gene Annotation ◽

Enrichment Analysis ◽

R Package ◽

Prediction Performance ◽

Gene Products ◽

Limited Information ◽

Non Coding Rnas ◽

User Friendly ◽

Go Terms

Abstract Objective Properties of gene products can be described or annotated with Gene Ontology (GO) terms. But for many genes we have limited information about their products, for example with respect to function. This is particularly true for long non-coding RNAs (lncRNAs), where the function in most cases is unknown. However, it has been shown that annotation as described by GO terms to some extent can be predicted by enrichment analysis on properties of co-expressed genes. Results GAPGOM integrates two relevant algorithms, lncRNA2GOA and TopoICSim, into a user-friendly R package. Here lncRNA2GOA does annotation prediction by co-expression, whereas TopoICSim estimates similarity between GO graphs, which can be used for benchmarking of prediction performance, but also for comparison of GO graphs in general. The package provides an improved implementation of the original tools, with substantial improvements in performance and documentation, unified interfaces, and additional features.

Download Full-text

scCancer: a package for automated processing of single cell RNA-seq data in cancer

10.1101/800490 ◽

2019 ◽

Author(s):

Wenbo Guo ◽

Dongfang Wang ◽

Shicheng Wang ◽

Yiran Shan ◽

Jin Gu

Keyword(s):

Single Cell ◽

Learning Algorithm ◽

R Package ◽

Rna Seq ◽

Cell Level ◽

Quality Control Metrics ◽

Automated Processing ◽

Cellular Phenotypes ◽

User Friendly ◽

Processing Steps

AbstractSummaryMolecular heterogeneities bring great challenges for cancer diagnosis and treatment. Recent advance in single cell RNA-sequencing (scRNA-seq) technology make it possible to study cancer transcriptomic heterogeneities at single cell level. Here, we develop an R package named scCancer which focuses on processing and analyzing scRNA-seq data for cancer research. Except basic data processing steps, this package takes several special considerations for cancer-specific features. Firstly, the package introduced comprehensive quality control metrics. Secondly, it used a data-driven machine learning algorithm to accurately identify major cancer microenvironment cell populations. Thirdly, it estimated a malignancy score to classify malignant (cancerous) and non-malignant cells. Then, it analyzed intra-tumor heterogeneities by key cellular phenotypes (such as cell cycle and stemness) and gene signatures. Finally, a user-friendly graphic report was generated for all the analyses.Availabilityhttp://lifeome.net/software/sccancer/[email protected]

Download Full-text