scholarly journals VoPo leverages cellular heterogeneity for predictive modeling of single-cell data

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Natalie Stanley ◽  
Ina A. Stelzer ◽  
Amy S. Tsai ◽  
Ramin Fallahzadeh ◽  
Edward Ganio ◽  
...  
2021 ◽  
Author(s):  
Guangyuan Li ◽  
Song Baobao ◽  
H. L Grimes ◽  
V. B. Surya Prasath ◽  
Nathan L Salomonis

Hundreds of bioinformatics approaches now exist to define cellular heterogeneity from single-cell genomics data. Reconciling conflicts between diverse methods, algorithm settings, annotations or modalities have the potential to clarify which populations are real and establish reusable reference atlases. Here, we present a customizable computational strategy called scTrianguate, which leverages cooperative game theory to intelligently mix-and-match clustering solutions from different resolutions, algorithms, reference atlases, or multi-modal measurements. This algorithm relies on a series of robust statistical metrics for cluster stability that work across molecular modalities to identify high-confidence integrated annotations. When applied to annotations from diverse competing cell atlas projects, this approach is able to resolve conflicts and determine the validity of controversial cell population predictions. Tested with scRNA-Seq, CITE-Seq (RNA + surface ADT), multiome (RNA + ATAC), and TEA-Seq (RNA + surface ADT + ATAC), this approach identifies highly stable and reproducible, known and novel cell populations, while excluding clusters defined by technical artifacts (i.e., doublets). Importantly, we find that distinct cell populations are frequently attributed with features from different modalities (RNA, ATAC, ADT) in the same assay, highlighting the importance of multimodal analysis in cluster determination. As it is flexible, this approach can be updated with new user-defined statistical metrics to alter the decision engine and customized to new measures of stability for different measures of cellular activity.


2022 ◽  
Author(s):  
Meelad Amouzgar ◽  
David R Glass ◽  
Reema Baskar ◽  
Inna Averbukh ◽  
Samuel C Kimmey ◽  
...  

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction enables visualization of data by representing cells in two-dimensional plots that capture the structure and heterogeneity of the original dataset. Visualizations contribute to human understanding of data and are useful for guiding both quantitative and qualitative analysis of cellular relationships. Existing algorithms are typically unsupervised, utilizing only measured features to generate manifolds, disregarding known biological labels such as cell type or experimental timepoint. Here, we repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling users to tailor visualizations to separate specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this flexible, computationally-efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes, such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality reduction algorithms and illustrate its utility and versatility for exploration of single-cell mass cytometry, transcriptomics and chromatin accessibility data.


2018 ◽  
Author(s):  
Subarna Palit ◽  
Fabian J. Theis ◽  
Christina E. Zielinski

AbstractRecent advances in cytometry have radically altered the fate of single-cell proteomics by allowing a more accurate understanding of complex biological systems. Mass cytometry (CyTOF) provides simultaneous single-cell measurements that are crucial to understand cellular heterogeneity and identify novel cellular subsets. High-dimensional CyTOF data were traditionally analyzed by gating on bivariate dot plots, which are not only laborious given the quadratic increase of complexity with dimension but are also biased through manual gating. This review aims to discuss the impact of new analysis techniques for in-depths insights into the dynamics of immune regulation obtained from static snapshot data and to provide tools to immunologists to address the high dimensionality of their single-cell data.


2022 ◽  
Author(s):  
Jiyuan Fang ◽  
Cliburn Chan ◽  
Kouros Owzar ◽  
Liuyang Wang ◽  
Diyuan Qin ◽  
...  

Single-cell RNA-sequencing (scRNA-seq) technology allows us to explore cellular heterogeneity in the transcriptome. Because most scRNA-seq data analyses begin with cell clustering, its accuracy considerably impacts the validity of downstream analyses. Although many clustering methods have been developed, few tools are available to evaluate the clustering "goodness-of-fit" to the scRNA-seq data. In this paper, we propose a new Clustering Deviation Index (CDI) that measures the deviation of any clustering label set from the observed single-cell data. We conduct in silico and experimental scRNA-seq studies to show that CDI can select the optimal clustering label set. Particularly, CDI also informs the optimal tuning parameters for any given clustering method and the correct number of cluster components.


2021 ◽  
Author(s):  
Daisha Van Der Watt ◽  
Hannah Boekweg ◽  
Thy Truong ◽  
Amanda J Guise ◽  
Edward D Plowey ◽  
...  

AbstractSingle cell proteomics is an emerging sub-field within proteomics with the potential to revolutionize our understanding of cellular heterogeneity and interactions. Recent efforts have largely focused on technological advancements in sample preparation, chromatography and instrumentation to enable measuring proteins present in these ultra-limited samples. Although advancements in data acquisition have rapidly improved our ability to analyze single cells, the software pipelines used in data analysis were originally written for traditional bulk samples and their performance on single cell data has not been investigated. We benchmarked five popular peptide identification tools on single cell proteomics data. We found that MetaMorpheus achieved the greatest number of peptide spectrum matches at a 1% false discovery rate. Depending on the tool, we also find that post processing machine learning can improve spectrum identification results by up to ∼40%. Although rescoring leads to a greater number of peptide spectrum matches, these new results typically are generated by 3rd party tools and have no way of being utilized by the primary pipeline for quantification. Exploration of novel metrics for machine learning algorithms will continue to improve performance.


2020 ◽  
Vol 36 (9) ◽  
pp. 2778-2786 ◽  
Author(s):  
Shobana V Stassen ◽  
Dickson M D Siu ◽  
Kelvin C M Lee ◽  
Joshua W K Ho ◽  
Hayden K H So ◽  
...  

Abstract Motivation New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. Results We introduce a highly scalable graph-based clustering algorithm PARC—Phenotyping by Accelerated Refined Community-partitioning—for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. Availability and implementation https://github.com/ShobiStassen/PARC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 (5) ◽  
pp. 122-138
Author(s):  
Mustafa Ozen ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S Emamian ◽  
Ali Abdi

Abstract Characterization of decision-making in cells in response to received signals is of importance for understanding how cell fate is determined. The problem becomes multi-faceted and complex when we consider cellular heterogeneity and dynamics of biochemical processes. In this paper, we present a unified set of decision-theoretic, machine learning and statistical signal processing methods and metrics to model the precision of signaling decisions, in the presence of uncertainty, using single cell data. First, we introduce erroneous decisions that may result from signaling processes and identify false alarms and miss events associated with such decisions. Then, we present an optimal decision strategy which minimizes the total decision error probability. Additionally, we demonstrate how graphing receiver operating characteristic curves conveniently reveals the trade-off between false alarm and miss probabilities associated with different cell responses. Furthermore, we extend the introduced framework to incorporate the dynamics of biochemical processes and reactions in a cell, using multi-time point measurements and multi-dimensional outcome analysis and decision-making algorithms. The introduced multivariate signaling outcome modeling framework can be used to analyze several molecular species measured at the same or different time instants. We also show how the developed binary outcome analysis and decision-making approach can be extended to more than two possible outcomes. As an example and to show how the introduced methods can be used in practice, we apply them to single cell data of PTEN, an important intracellular regulatory molecule in a p53 system, in wild-type and abnormal cells. The unified signaling outcome modeling framework presented here can be applied to various organisms ranging from viruses, bacteria, yeast and lower metazoans to more complex organisms such as mammalian cells. Ultimately, this signaling outcome modeling approach can be utilized to better understand the transition from physiological to pathological conditions such as inflammation, various cancers and autoimmune diseases.


2019 ◽  
pp. 1-10 ◽  
Author(s):  
Meghan C. Ferrall-Fairbanks ◽  
Markus Ball ◽  
Eric Padron ◽  
Philipp M. Altrock

PURPOSE Many cancers can be treated with targeted therapy. Almost inevitably, tumors develop resistance to targeted therapy, either from pre-existence or by evolving new genotypes and traits. Intratumor heterogeneity serves as a reservoir for resistance, which often occurs as a result of the selection of minor cellular subclones. On the level of gene expression, clonal heterogeneity can only be revealed using high-dimensional single-cell methods. We propose using a general diversity index (GDI) to quantify heterogeneity on multiple scales and relate it to disease evolution. MATERIALS AND METHODS We focused on individual patient samples that were probed with single-cell RNA (scRNA) sequencing to describe heterogeneity. We developed a pipeline to analyze single-cell data via sample normalization, clustering, and mathematical interpretation using a generalized diversity measure, as well as to exemplify the utility of this platform using single-cell data. RESULTS We focused on three sources of patient scRNA sequencing data: two healthy bone marrow (BM) donors, two patients with acute myeloid leukemia—each sampled before and after BM transplantation, four samples of presorted lineages—and six patients with lung carcinoma with multiregion sampling. While healthy/normal samples scored low in diversity overall, GDI further quantified the ways in which these samples differed. Whereas a widely used Shannon diversity index sometimes reveals fewer differences, GDI exhibits differences in the number of potential key drivers or clonal richness. Comparison of pre– and post–BM transplantation acute myeloid leukemia samples did not reveal differences in heterogeneity, although biological differences can exist. CONCLUSION GDI can quantify cellular heterogeneity changes across a wide spectrum, even when standard measures, such as the Shannon index, do not. Our approach can be widely applied to quantify heterogeneity across samples and conditions.


2021 ◽  
Vol 31 (10) ◽  
pp. 1728-1741 ◽  
Author(s):  
Benjamin J. Auerbach ◽  
Jian Hu ◽  
Muredach P. Reilly ◽  
Mingyao Li

The advent and rapid development of single-cell technologies have made it possible to study cellular heterogeneity at an unprecedented resolution and scale. Cellular heterogeneity underlies phenotypic differences among individuals, and studying cellular heterogeneity is an important step toward our understanding of the disease molecular mechanism. Single-cell technologies offer opportunities to characterize cellular heterogeneity from different angles, but how to link cellular heterogeneity with disease phenotypes requires careful computational analysis. In this article, we will review the current applications of single-cell methods in human disease studies and describe what we have learned so far from existing studies about human genetic variation. As single-cell technologies are becoming widely applicable in human disease studies, population-level studies have become a reality. We will describe how we should go about pursuing and designing these studies, particularly how to select study subjects, how to determine the number of cells to sequence per subject, and the needed sequencing depth per cell. We also discuss computational strategies for the analysis of single-cell data and describe how single-cell data can be integrated with bulk tissue data and data generated from genome-wide association studies. Finally, we point out open problems and future research directions.


2019 ◽  
Author(s):  
Shobana V. Stassen ◽  
Dickson M. D. Siu ◽  
Kelvin C. M. Lee ◽  
Joshua W. K. Ho ◽  
Hayden K. H. So ◽  
...  

AbstractMotivationNew single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity.ResultsWe introduce a highly scalable graph-based clustering algorithm PARC - phenotyping by accelerated refined community-partitioning – for ultralarge-scale, high-dimensional single-cell data (> 1 million cells). Using large single cell mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without sub-sampling of cells, including Phenograph, FlowSOM, and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single cell data set of 1.1M cells within 13 minutes, compared to >2 hours to the next fastest graph-clustering algorithm, Phenograph. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis.Availability and Implementationhttps://github.com/ShobiStassen/PARC


Sign in / Sign up

Export Citation Format

Share Document