scClustViz – Single-cell RNAseq cluster assessment and visualization

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.2 ◽

2019 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 8

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Download Full-text

Revealing immune responses in the Mycobacterium avium subsp. paratuberculosis-infected THP-1 cells using single cell RNA-sequencing

PLoS ONE ◽

10.1371/journal.pone.0254194 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254194

Author(s):

Hong-Tae Park ◽

Woo Bin Park ◽

Suji Kim ◽

Jong-Sung Lim ◽

Gyoungju Nah ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Single Cell ◽

Mycobacterium Avium ◽

Expression Patterns ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Cytokines And Chemokines

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.

Download Full-text

Single-Cell Analysis of the Gene Expression Effects of Developmental Lead (Pb) Exposure on the Mouse Hippocampus

Toxicological Sciences ◽

10.1093/toxsci/kfaa069 ◽

2020 ◽

Vol 176 (2) ◽

pp. 396-409

Author(s):

Kelly M Bakulski ◽

John F Dou ◽

Robert C Thompson ◽

Christopher Lee ◽

Lauren Y Middleton ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Cell Types ◽

Cell Cluster ◽

Marker Genes ◽

Cell Type ◽

Cell Clusters ◽

Pb Exposure

Abstract Lead (Pb) exposure is ubiquitous with permanent neurodevelopmental effects. The hippocampus brain region is involved in learning and memory with heterogeneous cellular composition. The hippocampus cell type-specific responses to Pb are unknown. The objective of this study is to examine perinatal Pb treatment effects on adult hippocampus gene expression, at the level of individual cells. In mice perinatally exposed to control water or a human physiologically relevant level (32 ppm in maternal drinking water) of Pb, 2 weeks prior to mating through weaning, we tested for hippocampus gene expression and cellular differences at 5 months of age. We sequenced RNA from 5258 hippocampal cells to (1) test for treatment gene expression differences averaged across all cells, (2) compare cell cluster composition by treatment, and (3) test for treatment gene expression and pathway differences within cell clusters. Gene expression patterns revealed 12 hippocampus cell clusters, mapping to major expected cell types (eg, microglia, astrocytes, neurons, and oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (p = 4.4 × 10−21) in adult mice. Across all cells, Pb treatment was associated with expression of cell cluster marker genes. Within cell clusters, Pb treatment (q < 0.05) caused differential gene expression in endothelial, microglial, pericyte, and astrocyte cells. Pb treatment upregulated protein folding pathways in microglia (p = 3.4 × 10−9) and stress response in oligodendrocytes (p = 3.2 × 10−5). Bulk tissue analysis may be influenced by changes in cell type composition, obscuring effects within vulnerable cell types. This study serves as a biological reference for future single-cell toxicant studies, to ultimately characterize molecular effects on cognition and behavior.

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btab257 ◽

2021 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

A single cell brain atlas in human Alzheimer’s disease

10.1101/628347 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alexandra Grubman ◽

Gabriel Chew ◽

John F. Ouyang ◽

Guizhi Sun ◽

Xin Yi Choo ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Cell Fate ◽

Expression Patterns ◽

Cell Types ◽

Gene Expression Patterns ◽

Cell Type ◽

Web Resource ◽

Cell Type Specific

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

10.1101/2020.11.07.373043 ◽

2020 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data ◽

Tissue Marker

AbstractMotivationMarker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern.ResultsTo capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list.Availability and implementationWe implement this method as an R package markerpen, hosted on https://github.com/yixuan/[email protected]

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

Investigating transcriptome-wide sex dimorphism by multi-level analysis of single-cell RNA sequencing data in ten mouse cell types

Biology of Sex Differences ◽

10.1186/s13293-020-00335-2 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Tianyuan Lu ◽

Jessica C. Mar

Keyword(s):

Gene Expression ◽

Single Cell ◽

Regulatory Networks ◽

Cell Types ◽

Marker Genes ◽

Biological Functions ◽

Sex Dimorphism ◽

Cell Type ◽

Transcriptional Regulatory ◽

Single Cell Rna Sequencing

Abstract Background It is a long established fact that sex is an important factor that influences the transcriptional regulatory processes of an organism. However, understanding sex-based differences in gene expression has been limited because existing studies typically sequence and analyze bulk tissue from female or male individuals. Such analyses average cell-specific gene expression levels where cell-to-cell variation can easily be concealed. We therefore sought to utilize data generated by the rapidly developing single cell RNA sequencing (scRNA-seq) technology to explore sex dimorphism and its functional consequences at the single cell level. Methods Our study included scRNA-seq data of ten well-defined cell types from the brain and heart of female and male young adult mice in the publicly available tissue atlas dataset, Tabula Muris. We combined standard differential expression analysis with the identification of differential distributions in single cell transcriptomes to test for sex-based gene expression differences in each cell type. The marker genes that had sex-specific inter-cellular changes in gene expression formed the basis for further characterization of the cellular functions that were differentially regulated between the female and male cells. We also inferred activities of transcription factor-driven gene regulatory networks by leveraging knowledge of multidimensional protein-to-genome and protein-to-protein interactions and analyzed pathways that were potential modulators of sex differentiation and dimorphism. Results For each cell type in this study, we identified marker genes with significantly different mean expression levels or inter-cellular distribution characteristics between female and male cells. These marker genes were enriched in pathways that were closely related to the biological functions of each cell type. We also identified sub-cell types that possibly carry out distinct biological functions that displayed discrepancies between female and male cells. Additionally, we found that while genes under differential transcriptional regulation exhibited strong cell type specificity, six core transcription factor families responsible for most sex-dimorphic transcriptional regulation activities were conserved across the cell types, including ASCL2, EGR, GABPA, KLF/SP, RXRα, and ZF. Conclusions We explored novel gene expression-based biomarkers, functional cell group compositions, and transcriptional regulatory networks associated with sex dimorphism with a novel computational pipeline. Our findings indicated that sex dimorphism might be widespread across the transcriptomes of cell types, cell type-specific, and impactful for regulating cellular activities.

Download Full-text

Single cell analysis of the effects of developmental lead (Pb) exposure on the hippocampus

10.1101/860403 ◽

2019 ◽

Author(s):

Kelly M. Bakulski ◽

John F. Dou ◽

Robert C. Thompson ◽

Christopher Lee ◽

Lauren Y. Middleton ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Differential Gene Expression ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Cell Clusters ◽

Differential Gene ◽

Pb Exposure ◽

And Behavior

AbstractBackgroundLead (Pb) exposure is ubiquitous and has permanent developmental effects on childhood intelligence and behavior and adulthood risk of dementia. The hippocampus is a key brain region involved in learning and memory, and its cellular composition is highly heterogeneous. Pb acts on the hippocampus by altering gene expression, but the cell type-specific responses are unknown.ObjectiveExamine the effects of perinatal Pb treatment on adult hippocampus gene expression, at the level of individual cells, in mice.MethodsIn mice perinatally exposed to control water (n=4) or a human physiologically-relevant level (32 ppm in maternal drinking water) of Pb (n=4), two weeks prior to mating through weaning, we tested for gene expression and cellular differences in the hippocampus at 5-months of age. Analysis was performed using single cell RNA-sequencing of 5,258 cells from the hippocampus by 10x Genomics Chromium to 1) test for gene expression differences averaged across all cells by treatment; 2) compare cell cluster composition by treatment; and 3) test for gene expression and pathway differences within cell clusters by treatment.ResultsGene expression patterns revealed 12 cell clusters in the hippocampus, mapping to major expected cell types (e.g. microglia, astrocytes, neurons, oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (P=4.4×10−21) in adult mice. Across all cells, differential gene expression analysis by Pb treatment revealed cluster marker genes. Within cell clusters, differential gene expression with Pb treatment (q<0.05) was observed in endothelial, microglial, pericyte, and astrocyte cells. Pathways up-regulated with Pb treatment were protein folding in microglia (P=3.4×10−9) and stress response in oligodendrocytes (P=3.2×10−5).ConclusionBulk tissue analysis may be confounded by changes in cell type composition and may obscure effects within vulnerable cell types. This study serves as a biological reference for future single cell studies of toxicant or neuronal complications, to ultimately characterize the molecular basis by which Pb influences cognition and behavior.

Download Full-text