A machine learning method for the discovery of minimum marker gene combinations for cell-type identification from single-cell RNA sequencing

AbstractPatch-seq, combining patch-clamp electrophysiology with single-cell RNA-sequencing (scRNAseq), enables unprecedented single-cell access to a neuron’s transcriptomic, electrophysiological, and morphological features. Here, we present a systematic review and re-analysis of scRNAseq profiles from 4 recent patch-seq datasets, benchmarking these against analogous profiles from cellular-dissociation based scRNAseq. We found an increased likelihood for off-target cell-type mRNA contamination in patch-seq, likely due to the passage of the patch-pipette through the processes of adjacent cells. We also observed that patch-seq samples varied considerably in the amount of mRNA that could be extracted from each cell, strongly biasing the numbers of detectable genes. We present a straightforward marker gene-based approach for controlling for these artifacts and show that our method improves the correspondence between gene expression and electrophysiological features. Our analysis suggests that these technical confounds likely limit the interpretability of patch-seq based single-cell transcriptomes. However, we provide concrete recommendations for quality control steps that can be performed prior to costly RNA-sequencing to optimize the yield of high quality samples.

Download Full-text

NS-Forest: A machine learning method for the objective identification of minimum marker gene combinations for cell type determination from single cell RNA sequencing

10.1101/2020.09.23.308932 ◽

2020 ◽

Author(s):

Brian Aevermann ◽

Yun Zhang ◽

Mark Novotny ◽

Trygve Bakken ◽

Jeremy Miller ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Cell Types ◽

Biological Research ◽

Marker Genes ◽

Cell Type ◽

Type Identity ◽

Wide Range

AbstractSingle cell genomics is rapidly advancing our knowledge of cell phenotypic types and states. Driven by single cell/nucleus RNA sequencing (scRNA-seq) data, comprehensive atlas projects covering a wide range of organisms and tissues are currently underway. As a result, it is critical that the cell transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell-types by surface protein expression to defining diseases by molecular drivers. Here we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the non-linear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that precisely captures the cell type identity represented in the complete scRNA-seq transcriptional profiles. The marker genes selected provide a barcode of the necessary and sufficient characteristics for semantic cell type definition and serve as useful tools for downstream biological investigation. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and non-coding RNAs in neuronal cell type identity.

Download Full-text

Single-cell RNA sequencing of cultured human endometrial CD140b+CD146+ perivascular cells highlights the importance of in vivo microenvironment

Stem Cell Research & Therapy ◽

10.1186/s13287-021-02354-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Dandan Cao ◽

Rachel W. S. Chan ◽

Ernest H. Y. Ng ◽

Kristina Gemzell-Danielsson ◽

William S. B. Yeung

Keyword(s):

Stem Cells ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Cellular Heterogeneity ◽

Stromal Fibroblast ◽

Perivascular Cells ◽

Endometrial Cells ◽

Single Cell Rna Sequencing

Abstract Background Endometrial mesenchymal-like stromal/stem cells (eMSCs) have been proposed as adult stem cells contributing to endometrial regeneration. One set of perivascular markers (CD140b&CD146) has been widely used to enrich eMSCs. Although eMSCs are easily accessible for regenerative medicine and have long been studied, their cellular heterogeneity, relationship to primary counterpart, remains largely unclear. Methods In this study, we applied 10X genomics single-cell RNA sequencing (scRNA-seq) to cultured human CD140b+CD146+ endometrial perivascular cells (ePCs) from menstrual and secretory endometrium. We also analyzed publicly available scRNA-seq data of primary endometrium and performed transcriptome comparison between cultured ePCs and primary ePCs at single-cell level. Results Transcriptomic expression-based clustering revealed limited heterogeneity within cultured menstrual and secretory ePCs. A main subpopulation and a small stress-induced subpopulation were identified in secretory and menstrual ePCs. Cell identity analysis demonstrated the similar cellular composition in secretory and menstrual ePCs. Marker gene expression analysis showed that the main subpopulations identified from cultured secretory and menstrual ePCs simultaneously expressed genes marking mesenchymal stem cell (MSC), perivascular cell, smooth muscle cell, and stromal fibroblast. GO enrichment analysis revealed that genes upregulated in the main subpopulation enriched in actin filament organization, cellular division, etc., while genes upregulated in the small subpopulation enriched in extracellular matrix disassembly, stress response, etc. By comparing subpopulations of cultured ePCs to the publicly available primary endometrial cells, it was found that the main subpopulation identified from cultured ePCs was culture-unique which was unlike primary ePCs or primary endometrial stromal fibroblast cells. Conclusion In summary, these data for the first time provides a single-cell atlas of the cultured human CD140b+CD146+ ePCs. The identification of culture-unique relatively homogenous cell population of CD140b+CD146+ ePCs underscores the importance of in vivo microenvironment in maintaining cellular identity.

Download Full-text

Critical downstream analysis steps for single-cell RNA sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbab105 ◽

2021 ◽

Author(s):

Zilong Zhang ◽

Feifei Cui ◽

Chen Lin ◽

Lingling Zhao ◽

Chunyu Wang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Noisy Data ◽

Single Cell Level ◽

Cell Type ◽

Sequencing Data ◽

Cell Level ◽

Bioinformatics Tool ◽

Single Cell Rna Sequencing ◽

Downstream Analysis

Abstract Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.

Download Full-text

Single-cell RNA sequencing of the mammalian pineal gland identifies two pinealocyte subtypes and cell type-specific daily patterns of gene expression

PLoS ONE ◽

10.1371/journal.pone.0205883 ◽

2018 ◽

Vol 13 (10) ◽

pp. e0205883 ◽

Cited By ~ 9

Author(s):

Joseph C. Mays ◽

Michael C. Kelly ◽

Steven L. Coon ◽

Lynne Holtzclaw ◽

Martin F. Rath ◽

...

Keyword(s):

Gene Expression ◽

Pineal Gland ◽

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Mammalian Pineal Gland ◽

Daily Patterns

Download Full-text

New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

10.1101/702118 ◽

2019 ◽

Cited By ~ 3

Author(s):

Evan Greene ◽

Greg Finak ◽

Leonard A. D’Amico ◽

Nina Bhardwaj ◽

Candice D. Church ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

T Cell ◽

Single Cell ◽

Cancer Immunotherapy ◽

Effector Memory ◽

Machine Learning Method ◽

Learning Method ◽

Modeling Framework ◽

Interpretable Machine Learning

AbstractHigh-dimensional single-cell cytometry is routinely used to characterize patient responses to cancer immunotherapy and other treatments. This has produced a wealth of datasets ripe for exploration but whose biological and technical heterogeneity make them difficult to analyze with current tools. We introduce a new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation. FAUST processes data on a per-sample basis and returns biologically interpretable cell phenotypes that can be compared across studies, making it well-suited for the analysis and integration of complex datasets. We demonstrate how FAUST can be used for candidate biomarker discovery and validation by applying it to a flow cytometry dataset from a Merkel cell carcinoma anti-PD-1 trial and discover new CD4+ and CD8+ effector-memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. We then use FAUST to validate these correlates in an independent CyTOF dataset from a published metastatic melanoma trial. Importantly, existing state-of-the-art computational discovery approaches as well as prior manual analysis did not detect these or any other statistically significant T cell sub-populations associated with anti-PD-1 treatment in either data set. We further validate our methodology by using FAUST to replicate the discovery of a previously reported myeloid correlate in a different published melanoma trial, and validate the correlate by identifying it de novo in two additional independent trials. FAUST’s phenotypic annotations can be used to perform cross-study data integration in the presence of heterogeneous data and diverse immunophenotyping staining panels, enabling hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework we call Phenotypic and Functional Differential Abundance (PFDA). We demonstrate this approach on data from myeloid and T cell panels across multiple trials. Together, these results establish FAUST as a powerful and versatile new approach for unbiased discovery in single-cell cytometry.

Download Full-text

Automatic cell type identification methods for single-cell RNA sequencing

Computational and Structural Biotechnology Journal ◽

10.1016/j.csbj.2021.10.027 ◽

2021 ◽

Author(s):

Bingbing Xie ◽

Qin Jiang ◽

Antonio Mora ◽

Xuri Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Identification Methods ◽

Single Cell Rna Sequencing

Download Full-text

Localization of migraine susceptibility genes in human brain by single-cell RNA sequencing

Cephalalgia ◽

10.1177/0333102418762476 ◽

2018 ◽

Vol 38 (13) ◽

pp. 1976-1983 ◽

Cited By ~ 5

Author(s):

William Renthal

Keyword(s):

Human Brain ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Cell Types ◽

Susceptibility Genes ◽

Brain Cell ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Brain Cell Types

Background Migraine is a debilitating disorder characterized by severe headaches and associated neurological symptoms. A key challenge to understanding migraine has been the cellular complexity of the human brain and the multiple cell types implicated in its pathophysiology. The present study leverages recent advances in single-cell transcriptomics to localize the specific human brain cell types in which putative migraine susceptibility genes are expressed. Methods The cell-type specific expression of both familial and common migraine-associated genes was determined bioinformatically using data from 2,039 individual human brain cells across two published single-cell RNA sequencing datasets. Enrichment of migraine-associated genes was determined for each brain cell type. Results Analysis of single-brain cell RNA sequencing data from five major subtypes of cells in the human cortex (neurons, oligodendrocytes, astrocytes, microglia, and endothelial cells) indicates that over 40% of known migraine-associated genes are enriched in the expression profiles of a specific brain cell type. Further analysis of neuronal migraine-associated genes demonstrated that approximately 70% were significantly enriched in inhibitory neurons and 30% in excitatory neurons. Conclusions This study takes the next step in understanding the human brain cell types in which putative migraine susceptibility genes are expressed. Both familial and common migraine may arise from dysfunction of discrete cell types within the neurovascular unit, and localization of the affected cell type(s) in an individual patient may provide insight into to their susceptibility to migraine.

Download Full-text

EnTSSR: a weighted ensemble learning method to impute single-cell RNA sequencing data

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3110850 ◽

2021 ◽

pp. 1-1

Author(s):

Fan Lu ◽

Yilong Lin ◽

Chongbin Yuan ◽

Xiao-Fei Zhang ◽

Le Ou-Yang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Ensemble Learning ◽

Learning Method ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Download Full-text

Transfer learning efficiently maps bone marrow cell types from mouse to human using single-cell RNA sequencing

Communications Biology ◽

10.1038/s42003-020-01463-6 ◽

2020 ◽

Vol 3 (1) ◽

Author(s):

Patrick S. Stumpf ◽

Xin Du ◽

Haruka Imanishi ◽

Yuya Kunisaki ◽

Yuichiro Semba ◽

...

Keyword(s):

Machine Learning ◽

Bone Marrow ◽

Single Cell ◽

Rna Sequencing ◽

Transfer Learning ◽

Biomedical Research ◽

Human Cell ◽

Cell Types ◽

Single Cell Rna Sequencing ◽

Using Data

AbstractBiomedical research often involves conducting experiments on model organisms in the anticipation that the biology learnt will transfer to humans. Previous comparative studies of mouse and human tissues were limited by the use of bulk-cell material. Here we show that transfer learning—the branch of machine learning that concerns passing information from one domain to another—can be used to efficiently map bone marrow biology between species, using data obtained from single-cell RNA sequencing. We first trained a multiclass logistic regression model to recognize different cell types in mouse bone marrow achieving equivalent performance to more complex artificial neural networks. Furthermore, it was able to identify individual human bone marrow cells with 83% overall accuracy. However, some human cell types were not easily identified, indicating important differences in biology. When re-training the mouse classifier using data from human, less than 10 human cells of a given type were needed to accurately learn its representation. In some cases, human cell identities could be inferred directly from the mouse classifier via zero-shot learning. These results show how simple machine learning models can be used to reconstruct complex biology from limited data, with broad implications for biomedical research.

Download Full-text