scGate: marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets

A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. Here we present scGate, an algorithm that automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. scGate purifies a cell population of interest using a set of markers organized in a hierarchical structure, akin to gating strategies employed in flow cytometry. In our benchmark for blood-derived and tumor-infiltrating immune cells, scGate outperforms SingleR, a state-of-the-art classifier for single-cell data. scGate is implemented as an R package and integrated with the Seurat framework, providing an intuitive tool to isolate cell populations of interest from complex scRNA-seq datasets. Availability: R package source code and reproducible tutorials are available at https://github.com/carmonalab/scGate

Download Full-text

ESCO: single cell expression simulation incorporating gene co-expression

10.1101/2020.10.20.347211 ◽

2020 ◽

Author(s):

Jinjin Tian ◽

Jiebiao Wang ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

R Package ◽

Brain Cell ◽

Gene Interactions ◽

Cell Type ◽

Imputation Methods ◽

Biological Interest ◽

A Cell ◽

Cell Expression ◽

Cell Data

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]

Download Full-text

High-throughput single-cell RNA-seq data imputation and characterization with surrogate-assisted automated deep learning

Briefings in Bioinformatics ◽

10.1093/bib/bbab368 ◽

2021 ◽

Author(s):

Xiangtao Li ◽

Shaochuan Li ◽

Lei Huang ◽

Shixiong Zhang ◽

Ka-chun Wong

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Single Cell ◽

Deep Neural Networks ◽

Expression Profiles ◽

Marker Gene ◽

Gene Expression Profiles ◽

Underlying Mechanisms ◽

Cell Data ◽

Gene Expression Levels

Abstract Single-cell RNA sequencing (scRNA-seq) technologies have been heavily developed to probe gene expression profiles at single-cell resolution. Deep imputation methods have been proposed to address the related computational challenges (e.g. the gene sparsity in single-cell data). In particular, the neural architectures of those deep imputation models have been proven to be critical for performance. However, deep imputation architectures are difficult to design and tune for those without rich knowledge of deep neural networks and scRNA-seq. Therefore, Surrogate-assisted Evolutionary Deep Imputation Model (SEDIM) is proposed to automatically design the architectures of deep neural networks for imputing gene expression levels in scRNA-seq data without any manual tuning. Moreover, the proposed SEDIM constructs an offline surrogate model, which can accelerate the computational efficiency of the architectural search. Comprehensive studies show that SEDIM significantly improves the imputation and clustering performance compared with other benchmark methods. In addition, we also extensively explore the performance of SEDIM in other contexts and platforms including mass cytometry and metabolic profiling in a comprehensive manner. Marker gene detection, gene ontology enrichment and pathological analysis are conducted to provide novel insights into cell-type identification and the underlying mechanisms. The source code is available at https://github.com/li-shaochuan/SEDIM.

Download Full-text

Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

10.1101/538553 ◽

2019 ◽

Cited By ~ 3

Author(s):

Arnav Moudgil ◽

Michael N. Wilkinson ◽

Xuhua Chen ◽

June He ◽

Alex J. Cammack ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Binding Sites ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Cell Types ◽

Specific Cell

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.

Download Full-text

Semi-soft Clustering of Single Cell Data

10.1101/285056 ◽

2018 ◽

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Pairwise Comparison ◽

Cell Types ◽

Intermediate Cell ◽

Soft Clustering ◽

Membership Matrix ◽

Cell Data

AbstractMotivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semi-soft clustering that can classify both pure and intermediate cell types from data on gene expression or protein abundance from individual cells. Called SOUP, for Semi-sOft clUstering with Pure cells, this novel algorithm reveals the clustering structure for both pure cells, which belong to one single cluster, as well as transitional cells with soft memberships. SOUP involves a two-step process: identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure the K cell types form in a similarity matrix, devised by pairwise comparison of the gene expression profiles of individual cells. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. SOUP is applicable to general clustering problems as well, as long as the unrestrictive modeling assumptions hold. The performance of SOUP is documented via extensive simulation studies. Using SOUP to analyze two single cell data sets from brain shows it produce sensible and interpretable results.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

Characterization of hormone-producing cell types in the teleost pituitary gland using single-cell RNA-seq

Scientific Data ◽

10.1038/s41597-021-01058-8 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Khadeeja Siddique ◽

Eirill Ager-Wick ◽

Romain Fontaine ◽

Finn-Arne Weltzien ◽

Christiaan V. Henkel

Keyword(s):

Single Cell ◽

Stress Responses ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Peptide Hormone ◽

Cell Types ◽

Specific Cell ◽

Endocrine Gland ◽

Hormone Production ◽

Male Adult

AbstractThe pituitary is the vertebrate endocrine gland responsible for the production and secretion of several essential peptide hormones. These, in turn, control many aspects of an animal’s physiology and development, including growth, reproduction, homeostasis, metabolism, and stress responses. In teleost fish, each hormone is presumably produced by a specific cell type. However, key details on the regulation of, and communication between these cell types remain to be resolved. We have therefore used single-cell sequencing to generate gene expression profiles for 2592 and 3804 individual cells from the pituitaries of female and male adult medaka (Oryzias latipes), respectively. Based on expression profile clustering, we define 15 and 16 distinct cell types in the female and male pituitary, respectively, of which ten are involved in the production of a single peptide hormone. Collectively, our data provide a high-quality reference for studies on pituitary biology and the regulation of hormone production, both in fish and in vertebrates in general.

Download Full-text

ILoReg enables high-resolution cell population identification from single-cell RNA-seq data

10.1101/2020.01.20.912675 ◽

2020 ◽

Author(s):

Johannes Smolander ◽

Sini Junttila ◽

Mikko S Venäläinen ◽

Laura L Elo

Keyword(s):

Feature Extraction ◽

High Resolution ◽

Single Cell ◽

Cell Population ◽

R Package ◽

High Dimensionality ◽

Cell Populations ◽

Rna Seq ◽

Extraction Step ◽

Resolution Cell

AbstractSingle-cell RNA-seq allows researchers to identify cell populations based on unsupervised clustering of the transcriptome. However, subpopulations can have only subtle transcriptomic differences and the high dimensionality of the data makes their identification challenging. We introduce ILoReg (https://github.com/elolab/iloreg), an R package implementing a new cell population identification method that achieves high differentiation resolution through a probabilistic feature extraction step that is applied before clustering and visualization.

Download Full-text

Venice: A New Algorithm for Finding Marker Genes in Single-Cell Transcriptomic Data

10.1101/2020.11.16.384479 ◽

2020 ◽

Author(s):

Hy Vuong ◽

Thao Truong ◽

Tan Phan ◽

Son Pham

Keyword(s):

Single Cell ◽

Cell Population ◽

Cell Types ◽

Marker Genes ◽

Data Sets ◽

Interactive Analysis ◽

Expression Of Genes ◽

A Cell ◽

Definition Of ◽

Cell Data

AbstractMost widely used tools for finding marker genes in single cell data (SeuratT/NegBinom/Poisson, CellRanger, EdgeR, limmatrend) use a conventional definition of differentially expressed genes: genes with different mean expression values. However, in single-cell data, a cell population can be a mixture of many cell types/cell states, hence the mean expression of genes cannot represent the whole population. In addition, these tools assume that gene expression of a population belongs to a specific family of distribution. This assumption is often violated in single-cell data. In this work, we define marker genes of a cell population as genes that can be used to distinguish cells in the population from cells in other populations. Besides log-fold change, we devise a new metric to classify genes into up-regulated, down-regulated, and transitional states. In a benchmark for finding up-regulated and down-regulated genes, our tool outperforms all compared methods, including Seurat, ROTS, scDD, edgeR, MAST, limma, normal t-test, Wilcoxon and Kolmogorov–Smirnov test. Our method is much faster than all compared methods, therefore, enables interactive analysis for large single-cell data sets in BioTuring Browser. Venice algorithm is available within Signac package: https://github.com/bioturing/signac1).

Download Full-text

netSmooth: Network-smoothing based imputation for single cell RNA-seq

F1000Research ◽

10.12688/f1000research.13511.2 ◽

2018 ◽

Vol 7 ◽

pp. 8 ◽

Cited By ~ 3

Author(s):

Jonathan Ronen ◽

Altuna Akalin

Keyword(s):

Single Cell ◽

Time Course ◽

Missing Values ◽

Cancer Genomics ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Covariance Structure ◽

R Package ◽

Rna Seq ◽

Distinct Cell

Single cell RNA-seq (scRNA-seq) experiments suffer from a range of characteristic technical biases, such as dropouts (zero or near zero counts) and high variance. Current analysis methods rely on imputing missing values by various means of local averaging or regression, often amplifying biases inherent in the data. We present netSmooth, a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We demonstrate that netSmooth improves clustering results of scRNA-seq experiments from distinct cell populations, time-course experiments, and cancer genomics. We provide an R package for our method, available at: https://github.com/BIMSBbioinfo/netSmooth.

Download Full-text

Single-cell analysis reveals a nestin+ tendon stem/progenitor cell population with strong tenogenic potentiality

Science Advances ◽

10.1126/sciadv.1600874 ◽

2016 ◽

Vol 2 (11) ◽

pp. e1600874 ◽

Cited By ~ 42

Author(s):

Zi Yin ◽

Jia-jie Hu ◽

Long Yang ◽

Ze-Feng Zheng ◽

Cheng-rui An ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Population ◽

Single Cell Analysis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Analysis ◽

Nestin Expression

The repair of injured tendons remains a formidable clinical challenge because of our limited understanding of tendon stem cells and the regulation of tenogenesis. With single-cell analysis to characterize the gene expression profiles of individual cells isolated from tendon tissue, a subpopulation of nestin+ tendon stem/progenitor cells (TSPCs) was identified within the tendon cell population. Using Gene Expression Omnibus datasets and immunofluorescence assays, we found that nestin expression was activated at specific stages of tendon development. Moreover, isolated nestin+ TSPCs exhibited superior tenogenic capacity compared to nestin− TSPCs. Knockdown of nestin expression in TSPCs suppressed their clonogenic capacity and reduced their tenogenic potential significantly both in vitro and in vivo. Hence, these findings provide new insights into the identification of subpopulations of TSPCs and illustrate the crucial roles of nestin in TSPC fate decisions and phenotype maintenance, which may assist in future therapeutic strategies to treat tendon disease.

Download Full-text