ProtAnno, an Automated Cell Type Annotation Tool for Single Cell Proteomics Data that Integrates Information from Multiple Reference Sources

Compared with sequencing-based global genomic profiling, cytometry labels targeted surface markers on millions of cells in parallel either by conjugated rare earth metal particles or Unique Molecular Identifier (UMI) barcodes. Correct annotation of these cells to specific cell types is a key step in the analysis of these data. However, there is no computational tool that automatically annotates single cell proteomics data for cell type inference. In this manuscript, we propose an automated single cell proteomics data annotation approach called ProtAnno to facilitate cell type assignments without laborious manual gating. ProtAnno is designed to incorporate information from annotated single cell RNA-seq (scRNA-seq), CITE-seq, and prior data knowledge (which can be imprecise) on biomarkers for different cell types. We have performed extensive simulations to demonstrate the accuracy and robustness of ProtAnno. For several single cell proteomics datasets that have been manually labeled, ProtAnno was able to correctly label most single cells. In summary, ProtAnno offers an accurate and robust tool to automate cell type annotations for large single cell proteomics datasets, and the analysis of such annotated cell types can offer valuable biological insights.

Download Full-text

scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data

Genome Biology ◽

10.1186/s13059-019-1862-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 31

Author(s):

Jose Alquicira-Hernandez ◽

Anuja Sathe ◽

Hanlee P. Ji ◽

Quan Nguyen ◽

Joseph E. Powell

Keyword(s):

Single Cell ◽

Mononuclear Cells ◽

Single Cells ◽

Prediction Method ◽

Cell Types ◽

Pancreatic Tissue ◽

Specific Cell ◽

Cell Type ◽

Learning Probability ◽

Dimension Space

AbstractSingle-cell RNA sequencing has enabled the characterization of highly specific cell types in many tissues, as well as both primary and stem cell-derived cell lines. An important facet of these studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an individual cell based on its transcriptional profile. Here, we present scPred, a new generalizable method that is able to provide highly accurate classification of single cells, using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning probability-based prediction method. We apply scPred to scRNA-seq data from pancreatic tissue, mononuclear cells, colorectal tumor biopsies, and circulating dendritic cells and show that scPred is able to classify individual cells with high accuracy. The generalized method is available at https://github.com/powellgenomicslab/scPred/.

Download Full-text

Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus

10.1101/2020.04.12.038000 ◽

2020 ◽

Author(s):

Feng Tian ◽

Fan Zhou ◽

Xiang Li ◽

Wenping Ma ◽

Honggui Wu ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Human Cell ◽

Expression Profiles ◽

Single Cells ◽

Cell Types ◽

List Type ◽

Cell Type ◽

Genomic Architecture ◽

Gene Modules

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM

Download Full-text

A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification

10.1101/247114 ◽

2018 ◽

Cited By ~ 1

Author(s):

Douglas Abrams ◽

Parveen Kumar ◽

R. Krishna Murthy Karuturi ◽

Joshy George

Keyword(s):

Experimental Design ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Number ◽

Fold Change ◽

Computational Method ◽

Marker Genes ◽

Cell Type ◽

Estimate Sample Size

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.

Download Full-text

Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

10.1101/538553 ◽

2019 ◽

Cited By ~ 3

Author(s):

Arnav Moudgil ◽

Michael N. Wilkinson ◽

Xuhua Chen ◽

June He ◽

Alex J. Cammack ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Binding Sites ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Cell Types ◽

Specific Cell

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.

Download Full-text

Computational approaches towards reducing contamination in single-cell RNA-seq data

10.1101/2020.07.15.205062 ◽

2020 ◽

Author(s):

Siamak Yousefi ◽

Hao Chen ◽

Jesse F. Ingels ◽

Melinda S. McCarty ◽

Arthur G. Centeno ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Real Life ◽

Cell Types ◽

Cell Capture ◽

Rna Seq ◽

Sequence Analyses ◽

Cell Functions ◽

Biological Interpretation ◽

Different Cell Types

SUMMARYSingle cell RNA sequencing has enabled quantification of single cells and identification of different cell types and subtypes as well as cell functions in different tissues. Single cell RNA sequence analyses assume acquired RNAs correspond to cells, however, RNAs from contamination within the input data are also captured by these assays. The sequencing of background contamination as well as unwanted cells making their way to the final assay Potentially confound the correct biological interpretation of single cell transcriptomic data. Here we demonstrate two approaches to deal with background contamination as well as profiling of unwanted cells in the assays. We use three real-life datasets of whole-cell capture and nucleotide single-cell captures generated by Fluidigm and 10x technologies and show that these methods reduce the effect of contamination, strengthen clustering of cells and improves biological interpretation.

Download Full-text

Mapping multicellular programs from single-cell profiles

10.1101/2020.08.11.245472 ◽

2020 ◽

Author(s):

Livnat Jerby-Arnon ◽

Aviv Regev

Keyword(s):

Single Cell ◽

Spatial Data ◽

Spatial Information ◽

Single Cells ◽

Cell Types ◽

Risk Genes ◽

Health And Disease ◽

Different Cell Types ◽

Cellular Components ◽

Acting In Concert

ABSTRACTTissue homeostasis relies on orchestrated multicellular circuits, where interactions between different cell types dynamically balance tissue function. While single-cell genomics identifies tissues’ cellular components, deciphering their coordinated action remains a major challenge. Here, we tackle this problem through a new framework of multicellular programs: combinations of distinct cellular programs in different cell types that are coordinated together in the tissue, thus forming a higher order functional unit at the tissue, rather than only cell, level. We develop the open-access DIALOGUE algorithm to systematically uncover such multi-cellular programs not only from spatial data, but even from tissue dissociated and profiled as single cells, e.g., by single-cell RNA-Seq. Tested on spatial transcriptomes from the mouse hypothalamus, DIALOGUE recovered spatial information, predicted the properties of a cell’s environment only based on its transcriptome, and identified multicellular programs that mark animal behavior. Applied to brain samples and colon biopsies profiled by scRNA-Seq, DIALOGUE identified multicellular configurations that mark Alzheimer’s disease and ulcerative colitis (UC), including a program spanning five cell types that is predictive of response to anti-TNF therapy in UC patients and enriched for UC risk genes from GWAS, each acting in different cell types, but all cells acting in concert. Taken together, our study provides a novel conceptual and methodological framework to unravel multicellular regulation in health and disease.

Download Full-text

RNA splicing programs define tissue compartments and cell types at single cell resolution

10.1101/2021.05.01.442281 ◽

2021 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Peter Wang ◽

SoRi Jang ◽

Antoine de Morree ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Rna Splicing ◽

Single Cells ◽

Cell Types ◽

Mouse Lemur ◽

Cell Type ◽

Multiple Organs ◽

Single Cell Pcr

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text

Massively multiplex single-cell Hi-C

10.1101/065052 ◽

2016 ◽

Cited By ~ 4

Author(s):

Vijay Ramani ◽

Xinxian Deng ◽

Kevin L Gunderson ◽

Frank J Steemers ◽

Christine M Disteche ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Heterogeneity ◽

Proof Of Concept ◽

Chromosome Conformation ◽

Large Numbers ◽

Conformational Properties ◽

Novel Method ◽

Different Cell Types

AbstractWe present combinatorial single cell Hi-C, a novel method that leverages combinatorial cellular indexing to measure chromosome conformation in large numbers of single cells. In this proof-of-concept, we generate and sequence combinatorial single cell Hi-C libraries for two mouse and four human cell types, comprising a total of 9,316 single cells across 5 experiments. We demonstrate the utility of single-cell Hi-C data in separating different cell types, identify previously uncharacterized cell-to-cell heterogeneity in the conformational properties of mammalian chromosomes, and demonstrate that combinatorial indexing is a generalizable molecular strategy for single-cell genomics.

Download Full-text

Massively parallel single cell lineage tracing using CRISPR/Cas9 induced genetic scars

10.1101/205971 ◽

2017 ◽

Cited By ~ 6

Author(s):

Bastiaan Spanjaard ◽

Bo Hu ◽

Nina Mitic ◽

Jan Philipp Junker

Keyword(s):

Single Cell ◽

Computational Analysis ◽

Systematic Approach ◽

Single Cells ◽

Cell Lineage ◽

Transcriptome Profiling ◽

Cell Types ◽

Lineage Tracing ◽

Lineage Trees ◽

Different Cell Types

A key goal of developmental biology is to understand how a single cell transforms into a full-grown organism consisting of many different cell types. Single-cell RNA-sequencing (scRNA-seq) has become a widely-used method due to its ability to identify all cell types in a tissue or organ in a systematic manner 1–3. However, a major challenge is to organize the resulting taxonomy of cell types into lineage trees revealing the developmental origin of cells. Here, we present a strategy for simultaneous lineage tracing and transcriptome profiling in thousands of single cells. By combining scRNA-seq with computational analysis of lineage barcodes generated by genome editing of transgenic reporter genes, we reconstruct developmental lineage trees in zebrafish larvae and adult fish. In future analyses, LINNAEUS (LINeage tracing by Nuclease-Activated Editing of Ubiquitous Sequences) can be used as a systematic approach for identifying the lineage origin of novel cell types, or of known cell types under different conditions.

Download Full-text