scholarly journals Modeling and measurement of signaling outcomes affecting decision making in noisy intracellular networks using machine learning methods

2020 ◽  
Vol 12 (5) ◽  
pp. 122-138
Author(s):  
Mustafa Ozen ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S Emamian ◽  
Ali Abdi

Abstract Characterization of decision-making in cells in response to received signals is of importance for understanding how cell fate is determined. The problem becomes multi-faceted and complex when we consider cellular heterogeneity and dynamics of biochemical processes. In this paper, we present a unified set of decision-theoretic, machine learning and statistical signal processing methods and metrics to model the precision of signaling decisions, in the presence of uncertainty, using single cell data. First, we introduce erroneous decisions that may result from signaling processes and identify false alarms and miss events associated with such decisions. Then, we present an optimal decision strategy which minimizes the total decision error probability. Additionally, we demonstrate how graphing receiver operating characteristic curves conveniently reveals the trade-off between false alarm and miss probabilities associated with different cell responses. Furthermore, we extend the introduced framework to incorporate the dynamics of biochemical processes and reactions in a cell, using multi-time point measurements and multi-dimensional outcome analysis and decision-making algorithms. The introduced multivariate signaling outcome modeling framework can be used to analyze several molecular species measured at the same or different time instants. We also show how the developed binary outcome analysis and decision-making approach can be extended to more than two possible outcomes. As an example and to show how the introduced methods can be used in practice, we apply them to single cell data of PTEN, an important intracellular regulatory molecule in a p53 system, in wild-type and abnormal cells. The unified signaling outcome modeling framework presented here can be applied to various organisms ranging from viruses, bacteria, yeast and lower metazoans to more complex organisms such as mammalian cells. Ultimately, this signaling outcome modeling approach can be utilized to better understand the transition from physiological to pathological conditions such as inflammation, various cancers and autoimmune diseases.

2019 ◽  
Author(s):  
Mustafa Ozen ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S. Emamian ◽  
Ali Abdi

AbstractCharacterization of decision makings in a cell in response to received signals is of high importance for understanding how cell fate is determined. The problem becomes multi-faceted and complex when we consider cellular heterogeneity and dynamics of biochemical processes. In this paper, we present a unified set of decision-theoretic and statistical signal processing methods and metrics to model the precision of signaling decisions, given uncertainty, using single cell data. First, we introduce erroneous decisions that may result from signaling processes, and identify false alarm and miss event that are associated with such decisions. Then, we present an optimal decision strategy which minimizes the total decision error probability. The optimal decision threshold or boundary is determined using the maximum likelihood principle that chooses the hypothesis under which the data are most probable. Additionally, we demonstrate how graphing receiver operating characteristic curve conveniently reveals the trade-off between false alarm and miss probabilities associated with different cell responses. Furthermore, we extend the introduced signaling outcome modeling framework to incorporate the dynamics of biochemical processes and reactions in a cell, using multi-time point measurements and multi-dimensional outcome analysis and decision making algorithms. The introduced multivariate signaling outcome modeling framework can be used to analyze several molecular species measured at the same or different time instants. We also show how the developed binary outcome analysis and decision making approach can be extended to include more than two possible outcomes. To show how the overall set of introduced models and methods can be used in practice and as an example, we apply them to single cell data of an intracellular regulatory molecule called Phosphatase and Tensin homolog (PTEN) in a p53 system, in wild-type and abnormal, e.g., mutant cells. These molecules are involved in tumor suppression, cell cycle regulation and apoptosis. The unified signaling outcome modeling framework presented here can be applied to various organisms ranging from simple ones such as viruses, bacteria, yeast, and lower metazoans, to more complex organisms such as mammalian cells. Ultimately, this signaling outcome modeling approach can be useful for better understanding of transition from physiological to pathological conditions such as inflammation, various cancers and autoimmune diseases.Brief SummaryCells are supposed to make correct decisions, i.e., respond properly to various signals and initiate certain cellular functions, based on the signals they receive from the surrounding environment. Due to signal transduction noise, signaling malfunctions or other factors, cells may respond differently to the same input signals, which may result in incorrect cell decisions. Modeling and quantification of decision making processes and signaling outcomes in cells have emerged as important research areas in recent years. Here we present univariate and multivariate data-driven statistical models and methods for analyzing dynamic decision making processes and signaling outcomes. Furthermore, we exemplify the methods using single cell data generated by a p53 system, in wild-type and abnormal cells.


2021 ◽  
Author(s):  
Daisha Van Der Watt ◽  
Hannah Boekweg ◽  
Thy Truong ◽  
Amanda J Guise ◽  
Edward D Plowey ◽  
...  

AbstractSingle cell proteomics is an emerging sub-field within proteomics with the potential to revolutionize our understanding of cellular heterogeneity and interactions. Recent efforts have largely focused on technological advancements in sample preparation, chromatography and instrumentation to enable measuring proteins present in these ultra-limited samples. Although advancements in data acquisition have rapidly improved our ability to analyze single cells, the software pipelines used in data analysis were originally written for traditional bulk samples and their performance on single cell data has not been investigated. We benchmarked five popular peptide identification tools on single cell proteomics data. We found that MetaMorpheus achieved the greatest number of peptide spectrum matches at a 1% false discovery rate. Depending on the tool, we also find that post processing machine learning can improve spectrum identification results by up to ∼40%. Although rescoring leads to a greater number of peptide spectrum matches, these new results typically are generated by 3rd party tools and have no way of being utilized by the primary pipeline for quantification. Exploration of novel metrics for machine learning algorithms will continue to improve performance.


2019 ◽  
Author(s):  
Evan Greene ◽  
Greg Finak ◽  
Leonard A. D’Amico ◽  
Nina Bhardwaj ◽  
Candice D. Church ◽  
...  

AbstractHigh-dimensional single-cell cytometry is routinely used to characterize patient responses to cancer immunotherapy and other treatments. This has produced a wealth of datasets ripe for exploration but whose biological and technical heterogeneity make them difficult to analyze with current tools. We introduce a new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation. FAUST processes data on a per-sample basis and returns biologically interpretable cell phenotypes that can be compared across studies, making it well-suited for the analysis and integration of complex datasets. We demonstrate how FAUST can be used for candidate biomarker discovery and validation by applying it to a flow cytometry dataset from a Merkel cell carcinoma anti-PD-1 trial and discover new CD4+ and CD8+ effector-memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. We then use FAUST to validate these correlates in an independent CyTOF dataset from a published metastatic melanoma trial. Importantly, existing state-of-the-art computational discovery approaches as well as prior manual analysis did not detect these or any other statistically significant T cell sub-populations associated with anti-PD-1 treatment in either data set. We further validate our methodology by using FAUST to replicate the discovery of a previously reported myeloid correlate in a different published melanoma trial, and validate the correlate by identifying it de novo in two additional independent trials. FAUST’s phenotypic annotations can be used to perform cross-study data integration in the presence of heterogeneous data and diverse immunophenotyping staining panels, enabling hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework we call Phenotypic and Functional Differential Abundance (PFDA). We demonstrate this approach on data from myeloid and T cell panels across multiple trials. Together, these results establish FAUST as a powerful and versatile new approach for unbiased discovery in single-cell cytometry.


2020 ◽  
Author(s):  
Etienne Becht ◽  
Daniel Tolstrup ◽  
Charles-Antoine Dutertre ◽  
Florent Ginhoux ◽  
Evan W. Newell ◽  
...  

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Natalie Stanley ◽  
Ina A. Stelzer ◽  
Amy S. Tsai ◽  
Ramin Fallahzadeh ◽  
Edward Ganio ◽  
...  

2017 ◽  
Vol 13 (4) ◽  
pp. e1005436 ◽  
Author(s):  
Iman Habibi ◽  
Raymond Cheong ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S. Emamian ◽  
...  

2021 ◽  
Author(s):  
Guangyuan Li ◽  
Song Baobao ◽  
H. L Grimes ◽  
V. B. Surya Prasath ◽  
Nathan L Salomonis

Hundreds of bioinformatics approaches now exist to define cellular heterogeneity from single-cell genomics data. Reconciling conflicts between diverse methods, algorithm settings, annotations or modalities have the potential to clarify which populations are real and establish reusable reference atlases. Here, we present a customizable computational strategy called scTrianguate, which leverages cooperative game theory to intelligently mix-and-match clustering solutions from different resolutions, algorithms, reference atlases, or multi-modal measurements. This algorithm relies on a series of robust statistical metrics for cluster stability that work across molecular modalities to identify high-confidence integrated annotations. When applied to annotations from diverse competing cell atlas projects, this approach is able to resolve conflicts and determine the validity of controversial cell population predictions. Tested with scRNA-Seq, CITE-Seq (RNA + surface ADT), multiome (RNA + ATAC), and TEA-Seq (RNA + surface ADT + ATAC), this approach identifies highly stable and reproducible, known and novel cell populations, while excluding clusters defined by technical artifacts (i.e., doublets). Importantly, we find that distinct cell populations are frequently attributed with features from different modalities (RNA, ATAC, ADT) in the same assay, highlighting the importance of multimodal analysis in cluster determination. As it is flexible, this approach can be updated with new user-defined statistical metrics to alter the decision engine and customized to new measures of stability for different measures of cellular activity.


Author(s):  
Julian Schmitz ◽  
Oliver Hertel ◽  
Boris Yermakov ◽  
Thomas Noll ◽  
Alexander Grünberger

Scaling down bioproduction processes has become a major driving force for more accelerated and efficient process development over the last decades. Especially expensive and time-consuming processes like the production of biopharmaceuticals with mammalian cell lines benefit clearly from miniaturization, due to higher parallelization and increased insights while at the same time decreasing experimental time and costs. Lately, novel microfluidic methods have been developed, especially microfluidic single-cell cultivation (MSCC) devices have been proved to be valuable to miniaturize the cultivation of mammalian cells. So far, growth characteristics of microfluidic cultivated cell lines were not systematically compared to larger cultivation scales; however, validation of a miniaturization tool against initial cultivation scales is mandatory to prove its applicability for bioprocess development. Here, we systematically investigate growth, morphology, and eGFP production of CHO-K1 cells in different cultivation scales ranging from a microfluidic chip (230 nl) to a shake flask (125 ml) and laboratory-scale stirred tank bioreactor (2.0 L). Our study shows a high comparability regarding specific growth rates, cellular diameters, and eGFP production, which proves the feasibility of MSCC as a miniaturized cultivation tool for mammalian cell culture. In addition, we demonstrate that MSCC provides insights into cellular heterogeneity and single-cell dynamics concerning growth and production behavior which, when occurring in bioproduction processes, might severely affect process robustness.


2022 ◽  
Author(s):  
Meelad Amouzgar ◽  
David R Glass ◽  
Reema Baskar ◽  
Inna Averbukh ◽  
Samuel C Kimmey ◽  
...  

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction enables visualization of data by representing cells in two-dimensional plots that capture the structure and heterogeneity of the original dataset. Visualizations contribute to human understanding of data and are useful for guiding both quantitative and qualitative analysis of cellular relationships. Existing algorithms are typically unsupervised, utilizing only measured features to generate manifolds, disregarding known biological labels such as cell type or experimental timepoint. Here, we repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling users to tailor visualizations to separate specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this flexible, computationally-efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes, such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality reduction algorithms and illustrate its utility and versatility for exploration of single-cell mass cytometry, transcriptomics and chromatin accessibility data.


Forecasting ◽  
2020 ◽  
Vol 2 (3) ◽  
pp. 267-283
Author(s):  
Alireza Rezazadeh

Predicting the outcome of sales opportunities is a core part of successful business management. Conventionally, undertaking this prediction has relied mostly on subjective human evaluations in the process of sales decision-making. In this paper, we addressed the problem of forecasting the outcome of Business to Business (B2B) sales by proposing a thorough data-driven Machine-Learning (ML) workflow on a cloud-based computing platform: Microsoft Azure Machine-Learning Service (Azure ML). This workflow consists of two pipelines: (1) An ML pipeline to train probabilistic predictive models on the historical sales opportunities data. In this pipeline, data is enriched with an extensive feature enhancement step and then used to train an ensemble of ML classification models in parallel. (2) A prediction pipeline to use the trained ML model and infer the likelihood of winning new sales opportunities along with calculating optimal decision boundaries. The effectiveness of the proposed workflow was evaluated on a real sales dataset of a major global B2B consulting firm. Our results implied that decision-making based on the ML predictions is more accurate and brings a higher monetary value.


Sign in / Sign up

Export Citation Format

Share Document