Discriminative feature of cells characterizes cell populations of interest by a small subset of genes

Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEG-based methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods.

Download Full-text

Discriminative feature of cells characterizes cell populations of interest by a small subset of genes

10.1101/2021.03.12.435089 ◽

2021 ◽

Author(s):

Takeru Fujii ◽

Kazumitsu Maehara ◽

Masatoshi Fujita ◽

Yasuyuki Ohkawa

Keyword(s):

Gene Expression ◽

Statistical Methods ◽

Cell Population ◽

Differentially Expressed Gene ◽

Adaptive Lasso ◽

Differentially Expressed ◽

Small Subset ◽

Specific Gene ◽

Large Sample Size ◽

Discriminative Feature

ABSTRACTStatistical methods for detecting differences in individual gene expression are indispensable for understanding cell types. However, conventional statistical methods have faced difficulties associated with the inflation of P-values because of both the large sample size and selection bias introduced by exploratory data analysis such as single-cell transcriptomics. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using differentially expressed gene-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for the discrimination of a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data, and that DFC enabled to characterize the muscle satellite cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement differentially expressed gene-based methods for interpreting large data sets.

Download Full-text

Cross-laboratory analysis of brain cell type transcriptomes with applications to interpretation of bulk tissue data

10.1101/089219 ◽

2016 ◽

Cited By ~ 8

Author(s):

B. Ogan Mancarci ◽

Lilah Toker ◽

Shreejoy J Tripathy ◽

Brenna Li ◽

Brad Rocco ◽

...

Keyword(s):

Nervous System ◽

Cell Types ◽

Brain Cell ◽

Marker Genes ◽

Specific Gene ◽

Published Data ◽

Specific Marker ◽

Cell Type ◽

Cell Type Specific ◽

Bulk Tissue

AbstractEstablishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single cell RNA-sequencing studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at Neuroexpresso.org.Significance StatementCell type markers are powerful tools in the study of the nervous system that help reveal properties of cell types and acquire additional information from large scale expression experiments. Despite their usefulness in the field, known marker genes for brain cell types are few in number. We present NeuroExpresso, a database of brain cell type specific gene expression profiles, and demonstrate the use of marker genes for acquiring cell type specific information from whole tissue expression. The database will prove itself as a useful resource for researchers aiming to reveal novel properties of the cell types and aid both laboratory and computational scientists to unravel the cell type specific components of brain disorders.

Download Full-text

scSorter: assigning cells to known cell types according to marker genes

Genome Biology ◽

10.1186/s13059-021-02281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hongyu Guo ◽

Jun Li

Keyword(s):

Real Data ◽

Cell Types ◽

Exact Expression ◽

Marker Genes ◽

Specific Marker ◽

Sequencing Data ◽

Reference Dataset ◽

Over Expression ◽

Higher Power ◽

Cell Type Specific

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Revealing immune responses in the Mycobacterium avium subsp. paratuberculosis-infected THP-1 cells using single cell RNA-sequencing

PLoS ONE ◽

10.1371/journal.pone.0254194 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254194

Author(s):

Hong-Tae Park ◽

Woo Bin Park ◽

Suji Kim ◽

Jong-Sung Lim ◽

Gyoungju Nah ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Single Cell ◽

Mycobacterium Avium ◽

Expression Patterns ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Cytokines And Chemokines

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.

Download Full-text

Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq

eLife ◽

10.7554/elife.63632 ◽

2021 ◽

Vol 10 ◽

Author(s):

Elliott Swanson ◽

Cara Lord ◽

Julian Reading ◽

Alexander T Heubeck ◽

Palak C Genge ◽

...

Keyword(s):

Gene Regulation ◽

Single Cell ◽

Human Peripheral Blood ◽

Single Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Specific Gene ◽

Test Case ◽

Cell Assays ◽

Paired Measurement

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.

Download Full-text

A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification

10.1101/247114 ◽

2018 ◽

Cited By ~ 1

Author(s):

Douglas Abrams ◽

Parveen Kumar ◽

R. Krishna Murthy Karuturi ◽

Joshy George

Keyword(s):

Experimental Design ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Number ◽

Fold Change ◽

Computational Method ◽

Marker Genes ◽

Cell Type ◽

Estimate Sample Size

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.

Download Full-text

The WNT receptor FZD7 contributes to self-renewal signaling of human embryonic stem cells

Biological Chemistry ◽

10.1515/bc.2008.108 ◽

2008 ◽

Vol 389 (7) ◽

Cited By ~ 42

Author(s):

Kai Melchior ◽

Jonathan Weiß ◽

Holm Zaehres ◽

Yong-mi Kim ◽

Carolyn Lutzko ◽

...

Keyword(s):

Es Cells ◽

Embryonic Stem ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Mrna Levels ◽

Es Cell ◽

Specific Expression ◽

Self Renewal ◽

Human Es Cells

Abstract A number of recent studies identified nuclear factors that together have the unique ability to induce pluripotency in differentiated cell types. However, little is known about the factors that are needed to maintain human embryonic stem (ES) cells in an undifferentiated state. In a search for such requirements, we performed a comprehensive meta-analysis of publicly available SAGE and microarray data. The rationale for this analysis was to identify genes that are exclusively expressed in human ES cell lines compared to 30 differentiated tissue types. The WNT receptor FZD7 was found among the genes with an ES cell-specific expression profile in both SAGE and microarray analyses. Subsequent validation by quantitative RT-PCR and flow cytometry confirmed that FZD7 mRNA levels in human ES cells are up to 200-fold higher compared to differentiated cell types. ShRNA-mediated knockdown of FZD7 in human ES cells induced dramatic changes in the morphology of ES cell colonies, perturbation of expression levels of germ layer-specific marker genes, and a rapid loss of expression of the ES cell-specific transcription factor OCT4. These findings identify the WNT receptor FZD7 as a novel ES cell-specific surface antigen with a likely important role in the maintenance of ES cell self-renewal capacity.

Download Full-text

A cell atlas of the fly kidney

10.1101/2021.09.03.458871 ◽

2021 ◽

Author(s):

Jun Xu ◽

Yifang Liu ◽

Hongjie Li ◽

Alexander J. Tarashansky ◽

Colin H. Kalicki ◽

...

Keyword(s):

Kidney Disease ◽

Kidney Cell ◽

Single Cells ◽

Malpighian Tubule ◽

Cell Types ◽

Malpighian Tubules ◽

Disease Models ◽

Marker Genes ◽

Waste Products ◽

Renal Stem Cells

Like humans, insects rely on precise regulation of their internal environments to survive. The insect renal system consists of Malpighian tubules and nephrocytes that share similarities to the mammalian kidney. Studies of the Drosophila Malpighian tubules and nephrocytes have provided many insights into our understanding of the excretion of waste products, stem cell regeneration, protein reabsorption, and as human kidney disease models. Here, we analyzed single-nucleus RNA sequencing (snRNA-seq) data sets to characterize the cell types of the adult fly kidney. We identified 11 distinct clusters representing renal stem cells (RSCs), stellate cells (SCs), regionally specific principal cells (PCs), garland nephrocyte cells (GCs) and pericardial nephrocytes (PNs). Analyses of these clusters revealed many new interesting features. For example, we found a new, previously unrecognized cell cluster: lower segment PCs that express Esyt2. In addition, we find that the SC marker genes RhoGEF64c, Frq2, Prip and CG10939 regulate their unusual cell shape. Further, we identified transcription factors specific to each cluster and built a network of signaling pathways that are potentially involved in mediating cell-cell communication between Malpighian tubule cell types. Finally, cross-species analysis allowed us to match the fly kidney cell types to mouse kidney cell types and planarian protonephridia - knowledge that will help the generation of kidney disease models. To visualize this dataset, we provide a web-based resource for gene expression in single cells (https://www.flyrnai.org/scRNA/kidney/). Altogether, our study provides a comprehensive resource for addressing gene function in the fly kidney and future disease studies.

Download Full-text

TEA-seq: a trimodal assay for integrated single cell measurement of transcription, epitopes, and chromatin accessibility

10.1101/2020.09.04.283887 ◽

2020 ◽

Cited By ~ 3

Author(s):

Elliott Swanson ◽

Cara Lord ◽

Julian Reading ◽

Alexander T. Heubeck ◽

Adam K. Savage ◽

...

Keyword(s):

Cell Surface ◽

Single Cell ◽

Human Peripheral Blood ◽

Signal To Noise Ratio ◽

Single Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Specific Gene ◽

Test Case ◽

Cell Assays

AbstractSingle-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to extracellular signals, and human disease states. scATAC-seq has been particularly challenging due to the large size of the human genome and processing artefacts resulting from DNA damage that are an inherent source of background signal. Downstream analysis and integration of scATAC-seq with other single-cell assays is complicated by the lack of clear phenotypic information linking chromatin state and cell type. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases the signal-to-noise ratio and allows simultaneous measurement of cell surface markers: Integrated Cellular Indexing of Chromatin Landscape and Epitopes (ICICLE-seq). We extended this approach using a droplet-based multiomics platform to develop a trimodal assay to simultaneously measure Transcriptomic state (scRNA-seq), cell surface Epitopes, and chromatin Accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.

Download Full-text