Identifying cellular-to-phenotype associations by elucidating hierarchical relationships in high-dimensional cytometry data

Mapping Intimacies ◽

10.1101/2021.07.08.451609 ◽

2021 ◽

Author(s):

Adam S Chan ◽

Wei Jiang ◽

Emily Blyth ◽

Jean Yee Hwa Yang ◽

Ellis Patrick

Keyword(s):

Single Cell ◽

Case Studies ◽

Hierarchical Structure ◽

High Throughput ◽

Cell Types ◽

Unsupervised Clustering ◽

High Dimensional ◽

Cell Type ◽

Parent Population ◽

Do So

High-throughput single cell technologies hold the promise of discovering novel cellular relationships with disease and necessitate the use of effective analytical workflows. When manual gating is used to define cell types, the gating hierarchy can be used to identify cell types whose abundances change relative to a parent population. This strategy allows subtle changes to be observed that could be missed if small subsets were compared to all measured cells. However, typical analyses that employ unsupervised clustering overlook the valuable hierarchical structure present in cell type definitions by exclusively quantifying the proportions of cell type clusters relative to all cells. We present treekoR, a framework that facilitates multiple quantifications and comparisons of cell type proportions. Our results from twelve case studies reinforce the importance of quantifying proportions relative to parent populations in the analyses of cytometry data - as failing to do so can lead to missing important biological insights.

Download Full-text

treekoR: identifying cellular-to-phenotype associations by elucidating hierarchical relationships in high-dimensional cytometry data

Genome Biology ◽

10.1186/s13059-021-02526-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Adam Chan ◽

Wei Jiang ◽

Emily Blyth ◽

Jean Yang ◽

Ellis Patrick

Keyword(s):

Single Cell ◽

Case Studies ◽

High Throughput ◽

Hierarchical Structures ◽

Cell Types ◽

Unsupervised Clustering ◽

High Dimensional ◽

Cell Type ◽

Clustering Techniques ◽

Do So

AbstractHigh-throughput single-cell technologies hold the promise of discovering novel cellular relationships with disease. However, analytical workflows constructed for these technologies to associate cell proportions with disease often employ unsupervised clustering techniques that overlook the valuable hierarchical structures that have been used to define cell types. We present treekoR, a framework that empirically recapitulates these structures, facilitating multiple quantifications and comparisons of cell type proportions. Our results from twelve case studies reinforce the importance of quantifying proportions relative to parent populations in the analyses of cytometry data — as failing to do so can lead to missing important biological insights.

Download Full-text

Machine Learning (ML) Can Successfully Support Microscopic Differential Counts of Peripheral Blood Smears in a High Throughput Hematology Laboratory

Blood ◽

10.1182/blood-2020-140215 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 45-46

Author(s):

Christian Pohlkamp ◽

Kapil Jhalani ◽

Niroshan Nadarajah ◽

Inseok Heo ◽

William Wetton ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

High Throughput ◽

Peripheral Blood ◽

Cell Types ◽

Cell Type ◽

Hematological Neoplasms ◽

Blood Smears ◽

Peripheral Blood Smears ◽

Very High

Background: Cytomorphology is the gold standard for quick assessment of peripheral blood and bone marrow samples in hematological neoplasms. It is a broadly-accepted method for orchestrating more specific diagnostics including immunophenotyping or genetics. Inter-/intra-observer-reproducibility of single cell classification is only 75 to 90%. Only a limited number of cells (100 - 500 cells/smear) is read in a time-consuming procedure. Machine learning (ML) is more reliable where human skills are limited, i.e. in handling large amounts of data or images. We here tested ML to differentiate peripheral blood leukocytes in a high throughput hematology laboratory. Aim: To establish an ML-based cell classifier capable of identifying healthy and pathologic cells in digitalized peripheral blood smear scans at an accuracy competitive with or outperforming human expert level. Methods: We selected >2,600 smears out of our unique archive of > 250,000 peripheral blood smears from hematological neoplasms. Depending on quality, we scanned up to 1,000 single cell images per smear. For image acquisition, a Metafer Scanning System (Zeiss Axio Imager.Z2 microscope, automatic slide feeder and automatic oiling device) from MetaSystems (Altlussheim, GER) was used. Areas of interest were defined by pre-scan in 10x magnification followed by high resolution scan in 40x to generate cell images for analysis. Average capture times for 300/500 cells were 3:43/4:37 min We set up a supervised ML-learning model using colour images (144x144 pixels) as input, outputting predicted probabilities of 21 predefined classes. We used ImageNet-pretrained Xception as our base model. We trained, evaluated and deployed the model using Amazon SageMaker on a subset of 82,974 images randomly selected from 514,183 cells captured and labelled for this study. 20 different cell types and one garbage class were classified. We included cell type categories referring to the critical importance of detecting rare leukemia subtypes (e.g. APL). Numbers of images from respective 21 classes ranged from 1,830 to 14,909 (median: 2,945). Minority classes were up-sampledto handle imbalances. Each picture was labelled by highly skilled technicians (median years practicing in this laboratory: 5) and two independent hematologists (median years at microscope: 20). Results: On a separate test set of 8,297 cells, our classifier was able to predict any of the five cell types occurring in the peripheral blood of healthy individuals (PMN, lymphocytes, monocytes, eosinophils, basophils) at very high median accuracy (97.0%) Median prediction accuracy of 15 rare or pathological cell types was 91.3%. For six critical pathological cell forms (myeloblasts, atypical/bilobulated promyelocytes in APL/APLv, hairy cells, lymphoma cells,plasma cells), median accuracy was 93.4% (sensitivity 93.8%). We saw a very high "T98 accuracy" for these cell types (98.5%) which is the accuracy of cell type predictions with prediction probability >0.98 (achieved in 2231/2417 cases), implicating that critical cells predicted with probability <0.98 should be flagged for human expert validation with priority. For all 21 classes median accuracy was 91.7%. Accuracy was lower for cells representing consecutive steps of maturation, e.g. promyelo-/myelo-/metamyelocytes, reproducing inconsistencies from the human-built phenotypic classification system (s.Fig.). Conclusions: We demonstrate an automated workflow using automatic microscopic cell capturing and ML-driven cell differentiation in samples of hematologic patients. Reproducibility, accuracy, sensitivity and specificity are above 90%, for many cell types above 98%. By flagging suspicious cells for humanvalidation, this tool can support even experienced hematology professionals, especially in detecting rare cell types. Given an appropriate scanning speed, it clearly outperforms human investigators in terms of examination time and number of differentiated cells. An ML-based intelligence can make its skills accessible to hematology laboratories on site or after upload of scanned cell images, independent of time/location. A cloud-based infrastructure is available. A prospective head to head challenge between ML-based classifier and human experts comparing sensitivity and accuracy for detection of all cell classes in peripheral blood will be tested to proof suitability for routine use (NCT 4466059). Figure Disclosures Heo: AWS: Current Employment. Wetton:AWS: Current Employment. Drescher:MetaSystems: Current Employment. Hänselmann:MetaSystems: Current Employment. Lörch:MetaSystems: Current equity holder in private company.

Download Full-text

RNA splicing programs define tissue compartments and cell types at single cell resolution

10.1101/2021.05.01.442281 ◽

2021 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Peter Wang ◽

SoRi Jang ◽

Antoine de Morree ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Rna Splicing ◽

Single Cells ◽

Cell Types ◽

Mouse Lemur ◽

Cell Type ◽

Multiple Organs ◽

Single Cell Pcr

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

10.1101/2020.04.22.056473 ◽

2020 ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Data Clusters

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

Strategies for Accurate Cell Type Identification in CODEX Multiplexed Imaging Data

Frontiers in Immunology ◽

10.3389/fimmu.2021.727626 ◽

2021 ◽

Vol 12 ◽

Author(s):

John W. Hickey ◽

Yuqi Tan ◽

Garry P. Nolan ◽

Yury Goltsev

Keyword(s):

Single Cell ◽

Cell Biology ◽

Clustering Algorithm ◽

Cell Types ◽

Unsupervised Clustering ◽

Cell Segmentation ◽

Imaging Data ◽

Cell Type ◽

Multiplexed Imaging ◽

Four Levels

Multiplexed imaging is a recently developed and powerful single-cell biology research tool. However, it presents new sources of technical noise that are distinct from other types of single-cell data, necessitating new practices for single-cell multiplexed imaging processing and analysis, particularly regarding cell-type identification. Here we created single-cell multiplexed imaging datasets by performing CODEX on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. After cell segmentation, we implemented five different normalization techniques crossed with four unsupervised clustering algorithms, resulting in 20 unique cell-type annotations for the same dataset. We generated two standard annotations: hand-gated cell types and cell types produced by over-clustering with spatial verification. We then compared these annotations at four levels of cell-type granularity. First, increasing cell-type granularity led to decreased labeling accuracy; therefore, subtle phenotype annotations should be avoided at the clustering step. Second, accuracy in cell-type identification varied more with normalization choice than with clustering algorithm. Third, unsupervised clustering better accounted for segmentation noise during cell-type annotation than hand-gating. Fourth, Z-score normalization was generally effective in mitigating the effects of noise from single-cell multiplexed imaging. Variation in cell-type identification will lead to significant differential spatial results such as cellular neighborhood analysis; consequently, we also make recommendations for accurately assigning cell-type labels to CODEX multiplexed imaging.

Download Full-text

Probabilistic cell type assignment of single-cell transcriptomic data reveals spatiotemporal microenvironment dynamics in human cancers

10.1101/521914 ◽

2019 ◽

Cited By ~ 6

Author(s):

Allen W Zhang ◽

Ciara O'Flanagan ◽

Elizabeth Chavez ◽

Jamie LP Lim ◽

Andrew McPherson ◽

...

Keyword(s):

Single Cell ◽

Temporal Dynamics ◽

Cell Types ◽

Large Datasets ◽

Unsupervised Clustering ◽

Marker Genes ◽

Immune Recognition ◽

Cell Type ◽

Type Assignment ◽

Existing Data

Single-cell RNA sequencing (scRNA-seq) has transformed biomedical research, enabling decomposition of complex tissues into disaggregated, functionally distinct cell types. For many applications, investigators wish to identify cell types with known marker genes. Typically, such cell type assignments are performed through unsupervised clustering followed by manual annotation based on these marker genes, or via "mapping" procedures to existing data. However, the manual interpretation required in the former case scales poorly to large datasets, which are also often prone to batch effects, while existing data for purified cell types must be available for the latter. Furthermore, unsupervised clustering can be error-prone, leading to under- and over- clustering of the cell types of interest. To overcome these issues we present CellAssign, a probabilistic model that leverages prior knowledge of cell type marker genes to annotate scRNA-seq data into pre-defined and de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while simultaneously controlling for batch and patient effects. We demonstrate the analytical advantages of CellAssign through extensive simulations and exemplify real-world utility to profile the spatial dynamics of high-grade serous ovarian cancer and the temporal dynamics of follicular lymphoma. Our analysis reveals subclonal malignant phenotypes and points towards an evolutionary interplay between immune and cancer cell populations with cancer cells escaping immune recognition.

Download Full-text

Cell type prioritization in single-cell data

10.1101/2019.12.20.884916 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael A. Skinnider ◽

Jordan W. Squair ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Matthieu Gautier ◽

...

Keyword(s):

Single Cell ◽

Neural Circuits ◽

Cell Types ◽

Chromatin Accessibility ◽

High Dimensional ◽

Machine Learning Method ◽

Learning Method ◽

Rna Seq ◽

Cell Type ◽

Cell Data

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.

Download Full-text

O3 High-dimensional analysis of tumor architecture predicts cancer immunotherapy response

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-itoc7.8 ◽

2020 ◽

Vol 8 (Suppl 2) ◽

pp. A4.2-A5

Author(s):

CM Schürch ◽

DJ Phillips ◽

M Matusiak ◽

B Rivero Gutierrez ◽

SS Bhate ◽

...

Keyword(s):

Single Cell ◽

Tumor Cells ◽

Immune Cells ◽

Cell Types ◽

Advisory Board ◽

High Dimensional ◽

Single Cell Level ◽

Cell Type ◽

Cell Level ◽

Research Grant

BackgroundImmunotherapies have induced long-lasting remissions in countless advanced-stage cancer patients, but many more patients have not benefitted. Therefore, novel predictive markers are needed to stratify patients before treatment and select those who will most likely benefit from immunotherapy, while avoiding potentially devastating adverse effects and high treatment costs for those who will not. We reasoned that thoroughly characterizing the architecture of the tumor microenvironment (TME) at the single-cell level by highly multiplexed tissue imaging should reveal novel spatial biomarkers of immunotherapy response.Materials and MethodsWe used CODEX (CO-Detection by indEXing) highly multiplexed fluorescence microscopy to investigate the TME of cutaneous T cell lymphoma (CTCL) in samples from patients treated with pembrolizumab. 55 protein markers were visualized simultaneously using a tissue microarray of matched pre- and post-treatment skin biopsies from 7 pembrolizumab responders and 7 non-responders. After computational image processing and extraction of single-cell information, cell types were identified by unsupervised clustering followed by supervised curation, and cell-cell distances and ‘cellular neighborhoods’ were computed. We also performed RNA sequencing on laser-capture microdissected tissue microarray cores to extract cell-type specific gene expression profiles by CIBERSORTx analysis.ResultsCODEX enabled the identification and characterization of malignant CD4+ tumor cells and reactive immune cells in the CTCL TME at the single-cell level, resulting in 21 different cell type clusters with spatial information. Cluster frequencies were not significantly different between responders and non-responders pre- and post-treatment. However, advanced computational analysis of the tumor architecture revealed cellular neighborhoods (CNs) that dynamically changed during pembrolizumab therapy and were correlated with response. Effector-type CNs enriched in tumor-infiltrating CD4+ T cells and dendritic cells were significantly increased after treatment in responders. In contrast, a regulatory T cell-enriched CN was significantly increased in non-responders before and after therapy. Furthermore, a spatial signature of cell-cell distances between tumor cells and effector/regulatory immune cells predicted therapy outcome. In addition, CIBERSORTx analysis revealed that tumor cells in responders, but not in non-responders, increased their expression of immune-activating genes.ConclusionsHigh-dimensional spatial analysis of CTCL tumors revealed a pre-existing immunosuppressive state in pembrolizumab non-responders. Thorough analysis of the TME therefore enables the discovery of novel spatial biomarkers in a concept that accounts for both cell type information and higher-order tumor architecture. Combining highly multiplexed microscopy with CIBERSORTx allows for the discovery of novel, predictive spatial biomarkers of immunotherapy response and will pave the way for future studies that functionally address these cell types and their interactions.Disclosure InformationC.M. Schürch: F. Consultant/Advisory Board; Modest; Enable Medicine, LLC. D.J. Phillips: None. M. Matusiak: None. B. Rivero Gutierrez: None. S.S. Bhate: None. G.L. Barlow: None. M.S. Khodadoust: B. Research Grant (principal investigator, collaborator or consultant and pending grants as well as grants already received); Significant; Corvus Pharmaceuticals. R. West: None. Y.H. Kim: B. Research Grant (principal investigator, collaborator or consultant and pending grants as well as grants already received); Significant; Merck, Horizon, Soligenix, miRagen, Forty Seven Inc., Neumedicine, Trillium, Galderma, Elorac. D. Speakers Bureau/Honoraria (speakers bureau, symposia, and expert witness); Significant; Innate Pharma, Eisai, Kyowa Hakko Kirin, Takeda, Seattle Genetics, Medivir, Portola Pharmaceuticals, Corvus Pharmaceuticals. G.P. Nolan: E. Ownership Interest (stock, stock options, patent or other intellectual property); Significant; Akoya Biosciences. F. Consultant/Advisory Board; Significant; Akoya Biosciences.

Download Full-text

Conditional out-of-distribution generation for unpaired data using transfer VAE

Bioinformatics ◽

10.1093/bioinformatics/btaa800 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i610-i617

Author(s):

Mohammad Lotfollahi ◽

Mohsen Naghipourfar ◽

Fabian J Theis ◽

F Alexander Wolf

Keyword(s):

Single Cell ◽

Generative Models ◽

Response To Treatment ◽

High Dimensional ◽

Compact Representation ◽

Hair Color ◽

Great Success ◽

Cell Type ◽

Style Transfer ◽

Cell Type Specific

Abstract Motivation While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation. Results We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%. Availability and implementation The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.

Download Full-text