Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Revealing immune responses in the Mycobacterium avium subsp. paratuberculosis-infected THP-1 cells using single cell RNA-sequencing

PLoS ONE ◽

10.1371/journal.pone.0254194 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254194

Author(s):

Hong-Tae Park ◽

Woo Bin Park ◽

Suji Kim ◽

Jong-Sung Lim ◽

Gyoungju Nah ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Single Cell ◽

Mycobacterium Avium ◽

Expression Patterns ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Cytokines And Chemokines

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.2 ◽

2019 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 8

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.1 ◽

2018 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 6

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

10.1101/2021.05.28.446161 ◽

2021 ◽

Author(s):

Daniel Osorio ◽

Marieke Lydia Kuijjer ◽

James J. Cai

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Single Experiment ◽

Tissue Samples ◽

Molecular Phenotypes ◽

Public Datasets

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

Download Full-text

genesorteR: Feature Ranking in Clustered Single Cell Data

10.1101/676379 ◽

2019 ◽

Cited By ~ 5

Author(s):

Mahmoud M Ibrahim ◽

Rafael Kramann

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Large Cell ◽

R Package ◽

Marker Genes ◽

Data Sets ◽

Cell Type ◽

Cell Data

ABSTRACTMarker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers.Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data.genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.

Download Full-text

Single-Cell Analysis of the Gene Expression Effects of Developmental Lead (Pb) Exposure on the Mouse Hippocampus

Toxicological Sciences ◽

10.1093/toxsci/kfaa069 ◽

2020 ◽

Vol 176 (2) ◽

pp. 396-409

Author(s):

Kelly M Bakulski ◽

John F Dou ◽

Robert C Thompson ◽

Christopher Lee ◽

Lauren Y Middleton ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Cell Types ◽

Cell Cluster ◽

Marker Genes ◽

Cell Type ◽

Cell Clusters ◽

Pb Exposure

Abstract Lead (Pb) exposure is ubiquitous with permanent neurodevelopmental effects. The hippocampus brain region is involved in learning and memory with heterogeneous cellular composition. The hippocampus cell type-specific responses to Pb are unknown. The objective of this study is to examine perinatal Pb treatment effects on adult hippocampus gene expression, at the level of individual cells. In mice perinatally exposed to control water or a human physiologically relevant level (32 ppm in maternal drinking water) of Pb, 2 weeks prior to mating through weaning, we tested for hippocampus gene expression and cellular differences at 5 months of age. We sequenced RNA from 5258 hippocampal cells to (1) test for treatment gene expression differences averaged across all cells, (2) compare cell cluster composition by treatment, and (3) test for treatment gene expression and pathway differences within cell clusters. Gene expression patterns revealed 12 hippocampus cell clusters, mapping to major expected cell types (eg, microglia, astrocytes, neurons, and oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (p = 4.4 × 10−21) in adult mice. Across all cells, Pb treatment was associated with expression of cell cluster marker genes. Within cell clusters, Pb treatment (q < 0.05) caused differential gene expression in endothelial, microglial, pericyte, and astrocyte cells. Pb treatment upregulated protein folding pathways in microglia (p = 3.4 × 10−9) and stress response in oligodendrocytes (p = 3.2 × 10−5). Bulk tissue analysis may be influenced by changes in cell type composition, obscuring effects within vulnerable cell types. This study serves as a biological reference for future single-cell toxicant studies, to ultimately characterize molecular effects on cognition and behavior.

Download Full-text

CellWalkR: An R Package for integrating single-cell and bulk data to resolve regulatory elements

10.1101/2021.02.23.432593 ◽

2021 ◽

Author(s):

Pawel F. Przytycki ◽

Katherine S. Pollard

Keyword(s):

Random Walk ◽

Single Cell ◽

R Package ◽

Regulatory Elements ◽

Random Walk Model ◽

Open Chromatin ◽

Cell Type ◽

Regulatory Regions ◽

Bulk Data ◽

Cell Type Specific

AbstractWhile single-cell open chromatin (scATAC-seq) data allows for the identification of cell type-specific regulatory regions, it is much sparser than bulk data. CellWalkR is an R package that performs an integration of external labeling and bulk epigenetic data with scATAC-seq using a network-based random walk model to help overcome this sparsity. Outputs include cell type labels for individual cells and regulatory regions.Availability and implementationCellWalkR is freely available as an R package under a GNU GPL-2.0 License, and can be accessed from https://github.com/PFPrzytycki/CellWalkR with an accompanying vignette for analyzing example data.

Download Full-text

Single-Cell Atlas of Adult Testis in Protogynous Hermaphroditic Orange-Spotted Grouper, Epinephelus coioides

International Journal of Molecular Sciences ◽

10.3390/ijms222212607 ◽

2021 ◽

Vol 22 (22) ◽

pp. 12607

Author(s):

Xi Wu ◽

Yang Yang ◽

Chaoyue Zhong ◽

Tong Wang ◽

Yanhong Deng ◽

...

Keyword(s):

Single Cell ◽

Germ Cells ◽

Expression Patterns ◽

Somatic Cells ◽

Human Testis ◽

Marker Genes ◽

Specific Marker ◽

Epinephelus Coioides ◽

Male Germ Cells ◽

Adult Testis

Spermatogenesis is a process of self-renewal and differentiation in spermatogonial stem cells. During this process, germ cells and somatic cells interact intricately to ensure long-term fertility and accurate genome propagation. Spermatogenesis has been intensely investigated in mammals but remains poorly understood with regard to teleosts. Here, we performed single-cell RNA sequencing of ~9500 testicular cells from the male, orange-spotted grouper. In the adult testis, we divided the cells into nine clusters and defined ten cell types, as compared with human testis data, including cell populations with characteristics of male germ cells and somatic cells, each of which expressed specific marker genes. We also identified and profiled the expression patterns of four marker genes (calr, eef1a, s100a1, vasa) in both the ovary and adult testis. Our data provide a blueprint of male germ cells and supporting somatic cells. Moreover, the cell markers are candidates that could be used for further cell identification.

Download Full-text