scholarly journals rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

2021 ◽  
Author(s):  
Daniel Osorio ◽  
Marieke Lydia Kuijjer ◽  
James J. Cai

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

2021 ◽  
Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.


2020 ◽  
Author(s):  
Yun Zhang ◽  
Brian D. Aevermann ◽  
Trygve E. Bakken ◽  
Jeremy A. Miller ◽  
Rebecca D. Hodge ◽  
...  

AbstractSingle cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method – FR-Match – that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.


2021 ◽  
Author(s):  
Kai Kang ◽  
Caizhi David Huang ◽  
Yuanyuan Li ◽  
David M. Umbach ◽  
Leping Li

AbstractBackgroundBiological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and with a new function to aid interpretation of deconvolution outcomes. The R package would be of interest for the broader R community.ResultWe developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating CDSeq-estimated cell types using publicly available single-cell RNA sequencing (scRNA-seq) data (single-cell data from 20 major organs are included in the R package). This function allows users to readily interpret and visualize the CDSeq-estimated cell types. We carried out additional validations of the CDSeqR software with in silico and in vitro mixtures and with real experimental data including RNA-seq data from the Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.ConclusionsThe existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. However, bulk level analyses neglect tissue heterogeneity and hinder investigation in a cell-type-specific fashion. The CDSeqR package can be viewed as providing in silico single-cell dissection of bulk measurements. It enables researchers to gain cell-type-specific information from bulk RNA-seq data.


2019 ◽  
Vol 35 (14) ◽  
pp. i436-i445 ◽  
Author(s):  
Gregor Sturm ◽  
Francesca Finotello ◽  
Florent Petitprez ◽  
Jitao David Zhang ◽  
Jan Baumbach ◽  
...  

Abstract Motivation The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing. Results We developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures. Availability and implementation A snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yun Zhang ◽  
Brian D Aevermann ◽  
Trygve E Bakken ◽  
Jeremy A Miller ◽  
Rebecca D Hodge ◽  
...  

Abstract Single cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method—FR-Match—that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.


2020 ◽  
Author(s):  
Dustin J. Sokolowski ◽  
Mariela Faykoo-Martinez ◽  
Lauren Erdman ◽  
Huayun Hou ◽  
Cadia Chan ◽  
...  

AbstractRNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.HighlightsscMappR integrates scRNA-seq and bulk RNA-seq to re-calibrate bulk differentially expressed genes (DEGs).scMappR correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of mouse kidney regeneration.scMappR is deployed as a user-friendly R package available at CRAN.


2021 ◽  
Author(s):  
Boyang Zhang ◽  
Zhicheng Ji ◽  
Hongkai Ji

Single-cell RNA-seq experiments with multiple samples are increasingly used to discover cell types and their molecular features that may influence sample phenotype (e.g. disease). However, analyzing and visualizing the complex cell type-phenotype association remains nontrivial. TreeCorTreat is an open source R package that tackles this problem by using a tree-based correlation screen to analyze and visualize the association between phenotype and transcriptomic features and cell types at multiple cell type resolution levels. With TreeCorTreat, one can conveniently explore and compare different feature types, phenotypic traits, analysis protocols and datasets, and evaluate the impacts of potential confounders.


2021 ◽  
Author(s):  
April R Kriebel ◽  
Joshua D Welch

Single-cell genomic technologies provide an unprecedented opportunity to define molecular cell types in a data-driven fashion, but present unique data integration challenges. Integration analyses often involve datasets with partially overlapping features, including both shared features that occur in all datasets and features exclusive to a single experiment. Previous computational integration approaches require that the input matrices share the same number of either genes or cells, and thus can use only shared features. To address this limitation, we derive a novel nonnegative matrix factorization algorithm for integrating single-cell datasets containing both shared and unshared features. The key advance is incorporating an additional metagene matrix that allows unshared features to inform the factorization. We demonstrate that incorporating unshared features significantly improves integration of single-cell RNA-seq, spatial transcriptomic, SHARE-seq, and cross-species datasets. We have incorporated the UINMF algorithm into the open-source LIGER R package (https://github.com/welch-lab/liger).


2019 ◽  
Author(s):  
Brandon Jew ◽  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Zong Miao ◽  
Arthur Ko ◽  
...  

AbstractWe present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and single-nucleus RNA-seq (snRNA-seq) data, Bisque was able to replicate previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. Bisque requires a single-cell reference dataset that reflects physiological cell type composition and can further leverage datasets that includes both bulk and single cell measurements over the same samples for improved accuracy. We further propose an additional mode of operation that merely requires a set of known marker genes. Bisque is available as an R package at: https://github.com/cozygene/bisque.


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document