scholarly journals LRcell: detecting the source of differential expression at the sub-cell type level from bulk RNA-seq data

2021 ◽  
Author(s):  
Wenjing Ma ◽  
Sumeet Sharma ◽  
Peng Jin ◽  
Shannon L Gourley ◽  
Zhaohui Qin

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.

2018 ◽  
Author(s):  
Yuqi Tan ◽  
Patrick Cahan

Single cell RNA-Seq has emerged as a powerful tool in diverse applications, ranging from determining the cell-type composition of tissues to uncovering the regulators of developmental programs. A near-universal step in the analysis of single cell RNA-Seq data is to hypothesize the identity of each cell. Often, this is achieved by finding cells that express combinations of marker genes that had previously been implicated as being cell-type specific, an approach that is not quantitative and does not explicitly take advantage of other single cell RNA-Seq studies. Here, we describe our tool, SingleCellNet, which addresses these issues and enables the classification of query single cell RNA-Seq data in comparison to reference single cell RNA-Seq data. SingleCellNet compares favorably to other methods, and it is notably able to make sensitive and accurate classifications across platforms and species. We demonstrate how SingleCellNet can be used to classify previously undetermined cells, and how it can be used to assess the outcome of cell fate engineering experiments.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 750
Author(s):  
Olukayode A. Sosina ◽  
Matthew N. Tran ◽  
Kristen R. Maynard ◽  
Ran Tao ◽  
Margaret A. Taub ◽  
...  

Background: Statistical deconvolution strategies have emerged over the past decade to estimate the proportion of various cell populations in homogenate tissue sources like brain using gene expression data. However, no study has been undertaken to assess the extent to which expression-based and DNAm-based cell type composition estimates agree. Results: Using estimated neuronal fractions from DNAm data, from the same brain region (i.e., matched) as our bulk RNA-Seq dataset, as proxies for the true unobserved cell-type fractions (i.e., as the gold standard), we assessed the accuracy (RMSE) and concordance (R2) of four reference-based deconvolution algorithms: Houseman, CIBERSORT, non-negative least squares (NNLS)/MIND, and MuSiC. We did this for two cell-type populations - neurons and non-neurons/glia - using matched single nuclei RNA-Seq and mismatched single cell RNA-Seq reference datasets. With the mismatched single cell RNA-Seq reference dataset, Houseman, MuSiC, and NNLS produced concordant (high correlation; Houseman R2 = 0.51, 95% CI [0.39, 0.65]; MuSiC R2 = 0.56, 95% CI [0.43, 0.69]; NNLS R2 = 0.54, 95% CI [0.32, 0.68]) but biased (high RMSE, >0.35) neuronal fraction estimates. CIBERSORT produced more discordant (moderate correlation; R2 = 0.25, 95% CI [0.15, 0.38]) neuronal fraction estimates, but with less bias (low RSME, 0.09). Using the matched single nuclei RNA-Seq reference dataset did not eliminate bias (MuSiC RMSE = 0.17). Conclusions: Our results together suggest that many existing RNA deconvolution algorithms estimate the RNA composition of homogenate tissue, e.g. the amount of RNA attributable to each cell type, and not the cellular composition, which relates to the underlying fraction of cells.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Qingnan Liang ◽  
Rachayata Dharmat ◽  
Leah Owen ◽  
Akbar Shakoor ◽  
Yumei Li ◽  
...  

AbstractSingle-cell RNA-seq is a powerful tool in decoding the heterogeneity in complex tissues by generating transcriptomic profiles of the individual cell. Here, we report a single-nuclei RNA-seq (snRNA-seq) transcriptomic study on human retinal tissue, which is composed of multiple cell types with distinct functions. Six samples from three healthy donors are profiled and high-quality RNA-seq data is obtained for 5873 single nuclei. All major retinal cell types are observed and marker genes for each cell type are identified. The gene expression of the macular and peripheral retina is compared to each other at cell-type level. Furthermore, our dataset shows an improved power for prioritizing genes associated with human retinal diseases compared to both mouse single-cell RNA-seq and human bulk RNA-seq results. In conclusion, we demonstrate that obtaining single cell transcriptomes from human frozen tissues can provide insight missed by either human bulk RNA-seq or animal models.


Author(s):  
Francisco Avila Cobos ◽  
José Alquicira-Hernandez ◽  
Joseph Powell ◽  
Pieter Mestdagh ◽  
Katleen De Preter

AbstractMany computational methods to infer cell type proportions from bulk transcriptomics data have been developed. Attempts comparing these methods revealed that the choice of reference marker signatures is far more important than the method itself. However, a thorough evaluation of the combined impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the results is still lacking.Using different single-cell RNA-sequencing (scRNA-seq) datasets, we generated hundreds of pseudo-bulk mixtures to evaluate the combined impact of these factors on the deconvolution results. Along with methods to perform deconvolution of bulk RNA-seq data we also included five methods specifically designed to infer the cell type composition of bulk data using scRNA-seq data as reference.Both bulk and single-cell deconvolution methods perform best when applied to data in linear scale and the choice of normalization can have a dramatic impact on the performance of some, but not all methods. Overall, single-cell methods have comparable performance to the best performing bulk methods and bulk methods based on semi-supervised approaches showed higher error and lower correlation values between the computed and the expected proportions. Moreover, failure to include cell types in the reference that are present in a mixture always led to substantially worse results, regardless of any of the previous choices. Taken together, we provide a thorough evaluation of the combined impact of the different factors affecting the computational deconvolution task across different datasets and propose general guidelines to maximize its performance.


2019 ◽  
Author(s):  
Kelly M. Bakulski ◽  
John F. Dou ◽  
Robert C. Thompson ◽  
Christopher Lee ◽  
Lauren Y. Middleton ◽  
...  

AbstractBackgroundLead (Pb) exposure is ubiquitous and has permanent developmental effects on childhood intelligence and behavior and adulthood risk of dementia. The hippocampus is a key brain region involved in learning and memory, and its cellular composition is highly heterogeneous. Pb acts on the hippocampus by altering gene expression, but the cell type-specific responses are unknown.ObjectiveExamine the effects of perinatal Pb treatment on adult hippocampus gene expression, at the level of individual cells, in mice.MethodsIn mice perinatally exposed to control water (n=4) or a human physiologically-relevant level (32 ppm in maternal drinking water) of Pb (n=4), two weeks prior to mating through weaning, we tested for gene expression and cellular differences in the hippocampus at 5-months of age. Analysis was performed using single cell RNA-sequencing of 5,258 cells from the hippocampus by 10x Genomics Chromium to 1) test for gene expression differences averaged across all cells by treatment; 2) compare cell cluster composition by treatment; and 3) test for gene expression and pathway differences within cell clusters by treatment.ResultsGene expression patterns revealed 12 cell clusters in the hippocampus, mapping to major expected cell types (e.g. microglia, astrocytes, neurons, oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (P=4.4×10−21) in adult mice. Across all cells, differential gene expression analysis by Pb treatment revealed cluster marker genes. Within cell clusters, differential gene expression with Pb treatment (q<0.05) was observed in endothelial, microglial, pericyte, and astrocyte cells. Pathways up-regulated with Pb treatment were protein folding in microglia (P=3.4×10−9) and stress response in oligodendrocytes (P=3.2×10−5).ConclusionBulk tissue analysis may be confounded by changes in cell type composition and may obscure effects within vulnerable cell types. This study serves as a biological reference for future single cell studies of toxicant or neuronal complications, to ultimately characterize the molecular basis by which Pb influences cognition and behavior.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Travis S. Johnson ◽  
Shunian Xiang ◽  
Bryan R. Helm ◽  
Zachary B. Abrams ◽  
Peter Neidecker ◽  
...  

Abstract Single-cell RNA sequencing (scRNA-seq) resolves heterogenous cell populations in tissues and helps to reveal single-cell level function and dynamics. In neuroscience, the rarity of brain tissue is the bottleneck for such study. Evidence shows that, mouse and human share similar cell type gene markers. We hypothesized that the scRNA-seq data of mouse brain tissue can be used to complete human data to infer cell type composition in human samples. Here, we supplement cell type information of human scRNA-seq data, with mouse. The resulted data were used to infer the spatial cellular composition of 3702 human brain samples from Allen Human Brain Atlas. We then mapped the cell types back to corresponding brain regions. Most cell types were localized to the correct regions. We also compare the mapping results to those derived from neuronal nuclei locations. They were consistent after accounting for changes in neural connectivity between regions. Furthermore, we applied this approach on Alzheimer’s brain data and successfully captured cell pattern changes in AD brains. We believe this integrative approach can solve the sample rarity issue in the neuroscience.


2018 ◽  
Author(s):  
Amir Alavi ◽  
Matthew Ruffalo ◽  
Aiyappa Parvangada ◽  
Zhilin Huang ◽  
Ziv Bar-Joseph

SummarySingle cell RNA-Seq (scRNA-seq) studies often profile upward of thousands of cells in heterogeneous environments. Current methods for characterizing cells perform unsupervised analysis followed by assignment using a small set of known marker genes. Such approaches are limited to a few, well characterized cell types. To enable large scale supervised characterization we developed an automated pipeline to download, process, and annotate publicly available scRNA-seq datasets. We extended supervised neural networks to obtain efficient and accurate representations for scRNA-seq data. We applied our pipeline to analyze data from over 500 different studies with over 300 unique cell types and show that supervised methods greatly outperform unsupervised methods for cell type identification. A case study of neural degeneration data highlights the ability of these methods to identify differences between cell type distributions in healthy and diseased mice. We implemented a web server that compares new datasets to collected data employing fast matching methods in order to determine cell types, key genes, similar prior studies, and more.


2019 ◽  
Author(s):  
Roger Pique-Regi ◽  
Roberto Romero ◽  
Adi L.Tarca ◽  
Edward D. Sendler ◽  
Yi Xu ◽  
...  

AbstractMore than 135 million births occur each year; yet, the molecular underpinnings of human parturition in gestational tissues, and in particular the placenta, are still poorly understood. The placenta is a complex heterogeneous organ including cells of both maternal and fetal origin, and insults that disrupt the maternal-fetal dialogue could result in adverse pregnancy outcomes such as preterm birth. There is limited knowledge of the cell type composition and transcriptional activity of the placenta and its compartments during physiologic and pathologic parturition. To fill this knowledge gap, we used scRNA-seq to profile the placental villous tree, basal plate, and chorioamniotic membranes of women with or without labor at term and those with preterm labor. Significant differences in cell type composition and transcriptional profiles were found among placental compartments and across study groups. For the first time, two cell types were identified: 1) lymphatic endothelial decidual cells in the chorioamniotic membranes, and 2) non-proliferative interstitial cytotrophoblasts in the placental villi. Maternal macrophages from the chorioamniotic membranes displayed the largest differences in gene expression (e.g. NFKB1) in both processes of labor; yet, specific gene expression changes were also detected in preterm labor. Importantly, several placental scRNA-seq transcriptional signatures were modulated with advancing gestation in the maternal circulation, and specific immune cell type signatures were increased with labor at term (NK-cell and activated T-cell) and with preterm labor (macrophage, monocyte, and activated T-cell). Herein, we provide a catalogue of cell types and transcriptional profiles in the human placenta, shedding light on the molecular underpinnings and non-invasive prediction of the physiologic and pathologic parturition.One sentence summaryThe common molecular pathway of parturition for both term and preterm spontaneous labor is characterized using single cell gene expression analysis of the human placenta.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yin Zhang ◽  
Fei Wang

Abstract Background With the continuous maturity of sequencing technology, different laboratories or different sequencing platforms have generated a large amount of single-cell transcriptome sequencing data for the same or different tissues. Due to batch effects and high dimensions of scRNA data, downstream analysis often faces challenges. Although a number of algorithms and tools have been proposed for removing batch effects, the current mainstream algorithms have faced the problem of data overcorrection when the cell type composition varies greatly between batches. Results In this paper, we propose a novel method named SSBER by utilizing biological prior knowledge to guide the correction, aiming to solve the problem of poor batch-effect correction when the cell type composition differs greatly between batches. Conclusions SSBER effectively solves the above problems and outperforms other algorithms when the cell type structure among batches or distribution of cell population varies considerably, or some similar cell types exist across batches.


Sign in / Sign up

Export Citation Format

Share Document