scholarly journals DSAVE: Detection of misclassified cells in single-cell RNA-Seq data

PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243360
Author(s):  
Johan Gustafsson ◽  
Jonathan Robinson ◽  
Juan S. Inda-Díaz ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.

2019 ◽  
Author(s):  
Dylan R. Farnsworth ◽  
Lauren Saunders ◽  
Adam C. Miller

ABSTRACTThe ability to define cell types and how they change during organogenesis is central to our understanding of animal development and human disease. Despite the crucial nature of this knowledge, we have yet to fully characterize all distinct cell types and the gene expression differences that generate cell types during development. To address this knowledge gap, we produced an Atlas using single-cell RNA-sequencing methods to investigate gene expression from the pharyngula to early larval stages in developing zebrafish. Our single-cell transcriptome Atlas encompasses transcriptional profiles from 44,102 cells across four days of development using duplicate experiments that confirmed high reproducibility. We annotated 220 identified clusters and highlighted several strategies for interrogating changes in gene expression associated with the development of zebrafish embryos at single-cell resolution. Furthermore, we highlight the power of this analysis to assign new cell-type or developmental stage-specific expression information to many genes, including those that are currently known only by sequence and/or that lack expression information altogether. The resulting Atlas is a resource of biologists to generate hypotheses for genetic (mutant) or functional analysis, to launch an effort to define the diversity of cell-types during zebrafish organogenesis, and to examine the transcriptional profiles that produce each cell type over developmental time.


2019 ◽  
Author(s):  
Monica Tambalo ◽  
Richard Mitter ◽  
David G. Wilkinson

AbstractSegmentation of the vertebrate hindbrain leads to the formation of rhombomeres, each with a distinct anteroposterior identity. Specialised boundary cells form at segment borders that act as a source or regulator of neuronal differentiation. In zebrafish, there is spatial patterning of neurogenesis in which non-neurogenic zones form at bounderies and segment centres, in part mediated by Fgf20 signaling. To further understand the control of neurogenesis, we have carried out single cell RNA sequencing of the zebrafish hindbrain at three different stages of patterning. Analyses of the data reveal known and novel markers of distinct hindbrain segments, of cell types along the dorsoventral axis, and of the transition of progenitors to neuronal differentiation. We find major shifts in the transcriptome of progenitors and of differentiating cells between the different stages analysed. Supervised clustering with markers of boundary cells and segment centres, together with RNA-seq analysis of Fgf-regulated genes, has revealed new candidate regulators of cell differentiation in the hindbrain. These data provide a valuable resource for functional investigations of the patterning of neurogenesis and the transition of progenitors to neuronal differentiation.


2017 ◽  
Author(s):  
Mohan T. Bolisetty ◽  
Michael L. Stitzel ◽  
Paul Robson

Advances in high-throughput single cell transcriptomics technologies have revolutionized the study of complex tissues. It is now possible to measure gene expression across thousands of individual cells to define cell types and states. While powerful computational and statistical frameworks are emerging to analyze these complex datasets, a gap exists between this data and a biologist’s insight. The CellView web application fills this gap by providing easy and intuitive exploration of single cell transcriptome data.


2021 ◽  
Author(s):  
Ming Yang ◽  
Benjamin R. Harrison ◽  
Daniel E.L. Promislow

AbstractBackgroundAlong with specialized functions, cells of multicellular organisms also perform essential functions common to most if not all cells. Whether diverse cells do this by using the same set of genes, interacting in a fixed coordinated fashion to execute essential functions, remains a central question in biology. Single-cell RNA-sequencing (scRNA-seq) measures gene expression of individual cells, enabling researchers to discover gene expression patterns that contribute to the diversity of cell functions. Current analyses focus primarily on identifying differentially expressed genes across cells. However, patterns of co-expression between genes are probably more indicative of biological processes than are the expression of individual genes. Using single cell transcriptome data from the fly brain, here we focus on gene co-expression to search for a core cellular network.ResultsIn this study, we constructed cell type-specific gene co-expression networks using single cell transcriptome data of brains from the fruit fly, Drosophila melanogaster. We detected a set of highly coordinated genes preserved across cell types in fly brains and defined this set as the core cellular network. This core is very small compared with cell type-specific gene co-expression networks and shows dense connectivity. Modules within this core are enriched for basic cellular functions, such as translation and ATP metabolic processes, and gene members of these modules have distinct evolutionary signatures.ConclusionsOverall, we demonstrated that a core cellular network exists in diverse cell types of fly brains and this core exhibits unique topological, structural, functional and evolutionary properties.


2019 ◽  
Author(s):  
Ying Hu ◽  
Mohini Ranganathan ◽  
Chang Shu ◽  
Xiaoyu Liang ◽  
Suhas Ganesh ◽  
...  

AbstractDelta 9-tetrahydrocannabinol (THC), the principal psychoactive constituent of cannabis, is also known to modulate immune response in peripheral cells. The mechanisms of THC’s effects on gene expression in human immune cells remains poorly understood. Combining a within-subject design with single cell transcriptome mapping, we report that administration of THC acutely alters gene expression in 15,973 human blood immune cells. Controlled for high inter-individual transcriptomic variability, we identified 294 transcriptome-wide significant genes among eight cell types including 69 common genes and 225 cell-type specific genes affected by acute THC administration, including those genes involving not only in immune response, cytokine production, but signal transduction, and cell proliferation and apoptosis. We revealed distinct transcriptomic sub-clusters affected by THC in major immune cell types where THC perturbed cell type-specific intracellular gene expression correlations. Gene set enrichment analysis further supports the findings of THC’s common and cell-type specific effects on immune response and cell toxicity. We found that THC alters the correlation of cannabinoid receptor gene, CNR2, with other genes in B cells, in which CNR2 showed the highest level of expression. This comprehensive cell-specific transcriptomic profiling identified novel genes regulated by THC and provides important insights into THC’s acute effects on immune function that may have important medical implications.


2021 ◽  
Author(s):  
Mariia Bilous ◽  
Loc Tran ◽  
Chiara Cianciaruso ◽  
Santiago J Carmona ◽  
Mikael J Pittet ◽  
...  

Single-cell RNA sequencing (scRNA-seq) technologies offer unique opportunities for exploring heterogeneous cell populations. However, in-depth single-cell transcriptomic characterization of complex tissues often requires profiling tens to hundreds of thousands of cells. Such large numbers of cells represent an important hurdle for downstream analyses, interpretation and visualization. Here we develop a network-based coarse-graining framework where highly similar cells are merged into super-cells. We demonstrate that super-cells not only preserve but often improve the results of downstream analyses including visualization, clustering, differential expression, cell type annotation, gene correlation, imputation, RNA velocity and data integration. By capitalizing on the redundancy inherent to scRNA-seq data, super-cells significantly facilitate and accelerate the construction and interpretation of single-cell atlases, as demonstrated by the integration of 1.46 million cells from COVID-19 patients in less than two hours on a standard desktop.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Dylan Kotliar ◽  
Adrian Veres ◽  
M Aurel Nagy ◽  
Shervin Tabrizi ◽  
Eran Hodis ◽  
...  

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.


2021 ◽  
Author(s):  
Hanbyeol Kim ◽  
Joongho Lee ◽  
Keunsoo Kang ◽  
Seokhyun Yoon

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.


2018 ◽  
Author(s):  
Dylan Kotliar ◽  
Adrian Veres ◽  
M. Aurel Nagy ◽  
Shervin Tabrizi ◽  
Eran Hodis ◽  
...  

AbstractIdentifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here we illustrate and enhance the use of matrix factorization as a solution to this problem. We show with simulations that a method that we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including the relative contribution of programs in each cell. Applied to published brain organoid and visual cortex scRNA-Seq datasets, cNMF refines the hierarchy of cell types and identifies both expected (e.g. cell cycle and hypoxia) and intriguing novel activity programs. We propose that one of the novel programs may reflect a neurosecretory phenotype and a second may underlie the formation of neuronal synapses. We make cNMF available to the community and illustrate how this approach can provide key insights into gene expression variation within and between cell types.


2020 ◽  
Author(s):  
Timothy J. Durham ◽  
Riza M. Daza ◽  
Louis Gevirtzman ◽  
Darren A. Cusanovich ◽  
William Stafford Noble ◽  
...  

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.


Sign in / Sign up

Export Citation Format

Share Document