scholarly journals Artificial-Cell-Type Aware Cell Type Classification in CITE-seq

2020 ◽  
Author(s):  
Qiuyu Lian ◽  
Hongyi Xin ◽  
Jianzhu Ma ◽  
Liza Konnikova ◽  
Wei Chen ◽  
...  

AbstractCellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types and complicates the automation of cell surface phenotyping. We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced artificial cell types. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real biological-cell-type droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell type annotation with domain knowledge in CITE-seq.

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i542-i550 ◽  
Author(s):  
Qiuyu Lian ◽  
Hongyi Xin ◽  
Jianzhu Ma ◽  
Liza Konnikova ◽  
Wei Chen ◽  
...  

Abstract Motivation Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping. Results We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq. Availability and implementation http://github.com/QiuyuLian/CITE-sort. Supplementary information Supplementary data is available at Bioinformatics online.


2017 ◽  
Author(s):  
Lihe Chen ◽  
Jae Wook Lee ◽  
Chung-Lin Chou ◽  
Anilkumar Nair ◽  
Maria Agustina Battistone ◽  
...  

ABSTRACTPrior RNA sequencing (RNA-Seq) studies have identified complete transcriptomes for most renal epithelial cell types. The exceptions are the cell types that make up the renal collecting duct, namely intercalated cells (ICs) and principal cells (PCs), which account for only a small fraction of the kidney mass, but play critical physiological roles in the regulation of blood pressure, extracellular fluid volume and extracellular fluid composition. To enrich these cell types, we used fluorescence-activated cell sorting (FACS) that employed well established lectin cell surface markers for PCs and type B ICs, as well as a newly identified cell surface marker for type A ICs, viz. c-Kit. Single-cell RNA-Seq using the 1C- and PC-enriched populations as input enabled identification of complete transcriptomes of A-ICs, B-ICs and PCs. The data were used to create a freely-accessible online gene-expression database for collecting duct cells. This database allowed identification of genes that are selectively expressed in each cell type including cell-surface receptors, transcription factors, transporters and secreted proteins. The analysis also identified a small fraction of hybrid cells expressing both aquapor¡n-2 and either anion exchanger 1 or pendrin transcripts. In many cases, mRNAs for receptors and their ligands were identified in different cells (e.g. Notch2 chiefly in PCs vs Jag1 chiefly in ICs) suggesting signaling crosstalk among the three cell types. The identified patterns of gene expression among the three types of collecting duct cells provide a foundation for understanding physiological regulation and pathophysiology in the renal collecting duct.SIGNIFICANCE STATEMENTA long-term goal in mammalian biology is to identify the genes expressed in every cell type of the body. In kidney, the expressed genes (“transcriptome”) of all epithelial cell types have already been identified with the exception of the cells that make up the renal collecting duct, responsible for regulation of blood pressure and body fluid composition. Here, a technique called "single-cell RNA-Seq" was used in mouse to identify transcriptomes for the major collecting-duct cell types: type A intercalated cells, type B intercalated cells and principal cells. The information was used to create a publicly-accessible online resource. The data allowed identification of genes that are selectively expressed in each cell type, informative for cell-level understanding of physiology and pathophysiology.


2019 ◽  
Author(s):  
Hongyi Xin ◽  
Qi Yan ◽  
Yale Jiang ◽  
Qiuyu Lian ◽  
Jiadi Luo ◽  
...  

AbstractIdentifying and removing multiplets from downstream analysis is essential to improve the scalability and reliability of single cell RNA sequencing (scRNA-seq). High multiplet rates create artificial cell types in the dataset. Sample barcoding, including the cell hashing technology and the MULTI-seq technology, enables analytical identification of a fraction of multiplets in a scRNA-seq dataset.We propose a Gaussian-mixture-model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes the sample-barcoding-detectable multiplets and estimates the percentage of sample-barcoding-undetectable multiplets in the remaining dataset. GMM-Demux describes the droplet formation process with an augmented binomial probabilistic model, and uses the model to authenticate cell types discovered from a scRNA-seq dataset.We conducted two cell-hashing experiments, collected a public cell-hashing dataset, and generated a simulated cellhashing dataset. We compared the classification result of GMM-Demux against a state-of-the-art heuristic-based classifier. We show that GMM-Demux is more accurate, more stable, reduces the error rate by up to 69×, and is capable of reliably recognizing 9 multiplet-induced fake cell types and 8 real cell types in a PBMC scRNA-seq dataset.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tian Tian ◽  
Jie Zhang ◽  
Xiang Lin ◽  
Zhi Wei ◽  
Hakon Hakonarson

AbstractClustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.


2019 ◽  
Author(s):  
Florian Wagner

AbstractClustering of cells by cell type is arguably the most common and repetitive task encountered during the analysis of single-cell RNA-Seq data. However, as popular clustering methods operate largely independently of visualization techniques, the fine-tuning of clustering parameters can be unintuitive and time-consuming. Here, I propose Galapagos, a simple and effective clustering workflow based on t-SNE and DBSCAN that does not require a gene selection step. In practice, Galapagos only involves the fine-tuning of two parameters, which is straightforward, as clustering is performed directly on the t-SNE visualization results. Using peripheral blood mononuclear cells as a model tissue, I validate the effectiveness of Galapagos in different ways. First, I show that Galapagos generates clusters corresponding to all main cell types present. Then, I demonstrate that the t-SNE results are robust to parameter choices and initialization points. Next, I employ a simulation approach to show that clustering with Galapagos is accurate and robust to the high levels of technical noise present. Finally, to demonstrate Galapagos’ accuracy on real data, I compare clustering results to true cell type identities established using CITE-Seq data. In this context, I also provide an example of the primary limitation of Galapagos, namely the difficulty to resolve related cell types in cases where t-SNE fails to clearly separate the cells. Galapagos helps to make clustering scRNA-Seq data more intuitive and reproducible, and can be implemented in most programming languages with only a few lines of code.


1989 ◽  
Vol 92 (2) ◽  
pp. 231-239
Author(s):  
P.I. Francz ◽  
K. Bayreuther ◽  
H.P. Rodemann

Methods for the selective enrichment of various subpopulations of the human skin fibroblast cell line HH-8 have been developed. These methods permit the selection of homogeneous populations of the three mitotic fibroblast cell types MF I, II and III, and the four postmitotic cell types PMF IV, V, VI and VII. These seven cell types exhibit differentiation-dependent and cell-type-specific patterns of [35S]methionine-labelled polypeptides in total soluble cytoplasmic and nuclear proteins, also in membrane-bound proteins, and in secreted proteins. In the differentiation sequence MF II-MF III-PMF IV - PMF V - PMF VI 14 cell-type-specific marker proteins have been found in the cytoplasmic and nuclear fraction, also 24 cell-type-specific marker proteins have been found in the membrane-bound protein fraction, and 11 cell-type-specific marker proteins in the secreted protein fraction. Markers in spontaneously arising and experimentally selected or induced populations of a single fibroblast cell type were found to be identical.


2020 ◽  
Author(s):  
Feng Tian ◽  
Fan Zhou ◽  
Xiang Li ◽  
Wenping Ma ◽  
Honggui Wu ◽  
...  

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Alexander J Tarashansky ◽  
Jacob M Musser ◽  
Margarita Khariton ◽  
Pengyang Li ◽  
Detlev Arendt ◽  
...  

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2021 ◽  
Author(s):  
Yun Zhang ◽  
Brian Aevermann ◽  
Rohan Gala ◽  
Richard H. Scheuermann

Reference cell type atlases powered by single cell transcriptomic profiling technologies have become available to study cellular diversity at a granular level. We present FR-Match for matching query datasets to reference atlases with robust and accurate performance for identifying novel cell types and non-optimally clustered cell types in the query data. This approach shows excellent performance for cross-platform, cross-sample type, cross-tissue region, and cross-data modality cell type matching.


Sign in / Sign up

Export Citation Format

Share Document