scSorter: assigning cells to known cell types according to marker genes

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Decontamination of ambient RNA in single-cell RNA-seq with DecontX

10.1101/704015 ◽

2019 ◽

Cited By ~ 3

Author(s):

Shiyi Yang ◽

Sean E. Corbett ◽

Yusuke Koga ◽

Zhe Wang ◽

W. Evan Johnson ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Cellular Heterogeneity ◽

Marker Genes ◽

Specific Marker ◽

Aberrant Expression ◽

Cell Type Specific ◽

Hierarchical Bayesian Method ◽

Assess Quality

ABSTRACTDroplet-based microfluidic devices have become widely used to perform single-cell RNA sequencing (scRNA-seq) and discover novel cellular heterogeneity in complex biological systems. However, ambient RNA present in the cell suspension can be incorporated into these droplets and aberrantly counted along with a cell’s native mRNA. This results in cross-contamination of transcripts between different cell populations and can potentially decrease the precision of downstream analyses. We developed a novel hierarchical Bayesian method called DecontX to estimate and remove contamination in individual cells from scRNA-seq data. DecontX accurately predicted the proportion of contaminated counts in a mixture of mouse and human cells. Decontamination of PBMC datasets removed aberrant expression of cell type specific marker genes from other cell types and improved overall separation of cell clusters. In general, DecontX can be incorporated into scRNA-seq workflows to assess quality of dissociation protocols and improve downstream analyses.

Download Full-text

A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019434 ◽

2019 ◽

Vol 33 ◽

pp. 9434-9439 ◽

Cited By ~ 2

Author(s):

Anna Paola Carrieri ◽

Will PM Rowe ◽

Martyn Winn ◽

Edward O. Pyzer-Knapp

Keyword(s):

Machine Learning ◽

Precision Agriculture ◽

Genetic Material ◽

Real Data ◽

Biological Information ◽

Marker Genes ◽

Specific Marker ◽

Sequencing Data ◽

Phenotype Prediction ◽

Time Accuracy

Research on the microbiome is an emerging and crucial science that finds many applications in healthcare, food safety, precision agriculture and environmental studies. Huge amounts of DNA from microbial communities are being sequenced and analyzed by scientists interested in extracting meaningful biological information from this big data. Analyzing massive microbiome sequencing datasets, which embed the functions and interactions of thousands of different bacterial, fungal and viral species, is a significant computational challenge. Artificial intelligence has the potential for building predictive models that can provide insights for specific cutting edge applications such as guiding diagnostics and developing personalised treatments, as well as maintaining soil health and fertility. Current machine learning workflows that predict traits of host organisms from their commensal microbiome do not take into account the whole genetic material constituting the microbiome, instead basing the analysis on specific marker genes. In this paper, to the best of our knowledge, we introduce the first machine learning workflow that efficiently performs host phenotype prediction from whole shotgun metagenomes by computing similaritypreserving compact representations of the genetic material. Our workflow enables prediction tasks, such as classification and regression, from Terabytes of raw sequencing data that do not necessitate any pre-prossessing through expensive bioinformatics pipelines. We compare the performance in terms of time, accuracy and uncertainty of predictions for four different classifiers. More precisely, we demonstrate that our ML workflow can efficiently classify real data with high accuracy, using examples from dog and human metagenomic studies, representing a step forward towards real time diagnostics and a potential for cloud applications.

Download Full-text

Cross-laboratory analysis of brain cell type transcriptomes with applications to interpretation of bulk tissue data

10.1101/089219 ◽

2016 ◽

Cited By ~ 8

Author(s):

B. Ogan Mancarci ◽

Lilah Toker ◽

Shreejoy J Tripathy ◽

Brenna Li ◽

Brad Rocco ◽

...

Keyword(s):

Nervous System ◽

Cell Types ◽

Brain Cell ◽

Marker Genes ◽

Specific Gene ◽

Published Data ◽

Specific Marker ◽

Cell Type ◽

Cell Type Specific ◽

Bulk Tissue

AbstractEstablishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single cell RNA-sequencing studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at Neuroexpresso.org.Significance StatementCell type markers are powerful tools in the study of the nervous system that help reveal properties of cell types and acquire additional information from large scale expression experiments. Despite their usefulness in the field, known marker genes for brain cell types are few in number. We present NeuroExpresso, a database of brain cell type specific gene expression profiles, and demonstrate the use of marker genes for acquiring cell type specific information from whole tissue expression. The database will prove itself as a useful resource for researchers aiming to reveal novel properties of the cell types and aid both laboratory and computational scientists to unravel the cell type specific components of brain disorders.

Download Full-text

Cytoplasmic, nuclear, membrane-bound and secreted [35S]methionine-labelled polypeptide pattern in differentiating fibroblast stem cells in vitro

Journal of Cell Science ◽

10.1242/jcs.92.2.231 ◽

1989 ◽

Vol 92 (2) ◽

pp. 231-239

Author(s):

P.I. Francz ◽

K. Bayreuther ◽

H.P. Rodemann

Keyword(s):

Protein Fraction ◽

Fibroblast Cell ◽

Cell Types ◽

Specific Marker ◽

Human Skin Fibroblast ◽

Nuclear Fraction ◽

Cell Type ◽

Marker Proteins ◽

Cell Type Specific ◽

Membrane Bound

Methods for the selective enrichment of various subpopulations of the human skin fibroblast cell line HH-8 have been developed. These methods permit the selection of homogeneous populations of the three mitotic fibroblast cell types MF I, II and III, and the four postmitotic cell types PMF IV, V, VI and VII. These seven cell types exhibit differentiation-dependent and cell-type-specific patterns of [35S]methionine-labelled polypeptides in total soluble cytoplasmic and nuclear proteins, also in membrane-bound proteins, and in secreted proteins. In the differentiation sequence MF II-MF III-PMF IV - PMF V - PMF VI 14 cell-type-specific marker proteins have been found in the cytoplasmic and nuclear fraction, also 24 cell-type-specific marker proteins have been found in the membrane-bound protein fraction, and 11 cell-type-specific marker proteins in the secreted protein fraction. Markers in spontaneously arising and experimentally selected or induced populations of a single fibroblast cell type were found to be identical.

Download Full-text

Revealing immune responses in the Mycobacterium avium subsp. paratuberculosis-infected THP-1 cells using single cell RNA-sequencing

PLoS ONE ◽

10.1371/journal.pone.0254194 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254194

Author(s):

Hong-Tae Park ◽

Woo Bin Park ◽

Suji Kim ◽

Jong-Sung Lim ◽

Gyoungju Nah ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Single Cell ◽

Mycobacterium Avium ◽

Expression Patterns ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Cytokines And Chemokines

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.

Download Full-text

ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data

10.1101/426593 ◽

2018 ◽

Cited By ~ 2

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Xiaoyu Lu ◽

Szu-wei Tu ◽

Yifan Sun ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Training Data ◽

Marker Genes ◽

Cell Detection ◽

Omics Data ◽

Deconvolution Method ◽

Cell Type ◽

Data Set ◽

Cell Type Specific

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

The WNT receptor FZD7 contributes to self-renewal signaling of human embryonic stem cells

Biological Chemistry ◽

10.1515/bc.2008.108 ◽

2008 ◽

Vol 389 (7) ◽

Cited By ~ 42

Author(s):

Kai Melchior ◽

Jonathan Weiß ◽

Holm Zaehres ◽

Yong-mi Kim ◽

Carolyn Lutzko ◽

...

Keyword(s):

Es Cells ◽

Embryonic Stem ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Mrna Levels ◽

Es Cell ◽

Specific Expression ◽

Self Renewal ◽

Human Es Cells

Abstract A number of recent studies identified nuclear factors that together have the unique ability to induce pluripotency in differentiated cell types. However, little is known about the factors that are needed to maintain human embryonic stem (ES) cells in an undifferentiated state. In a search for such requirements, we performed a comprehensive meta-analysis of publicly available SAGE and microarray data. The rationale for this analysis was to identify genes that are exclusively expressed in human ES cell lines compared to 30 differentiated tissue types. The WNT receptor FZD7 was found among the genes with an ES cell-specific expression profile in both SAGE and microarray analyses. Subsequent validation by quantitative RT-PCR and flow cytometry confirmed that FZD7 mRNA levels in human ES cells are up to 200-fold higher compared to differentiated cell types. ShRNA-mediated knockdown of FZD7 in human ES cells induced dramatic changes in the morphology of ES cell colonies, perturbation of expression levels of germ layer-specific marker genes, and a rapid loss of expression of the ES cell-specific transcription factor OCT4. These findings identify the WNT receptor FZD7 as a novel ES cell-specific surface antigen with a likely important role in the maintenance of ES cell self-renewal capacity.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text