AbstractHyper spectral imaging, sensor networks, spatial multiplexed proteomics, and spatial transcriptomics assays is a representative subset of distinct technologies from diverse domains of science and engineering that share common data structures. The data in all these modalities consist of high-dimensional multivariate observations (m-dimensional feature space) collected at different spatial positions and therefore can be analyzed using similar computational methodologies. Furthermore, in many studies practitioners collect datasets consisting of multiple spatial assays of this type, each capturing such data from a single biological sample, patient, or hyper spectral image, etc. Each of these spatial assays could be characterized by several regions of interest (ROIs). The focus of this paper is on a particular application, imaging mass cytometry (IMC), which falls into this problem setup. To extract meaningful information from the multi-dimensional observations recorded at different ROIs across different assays, we propose to analyze such datasets using a two-step graph-based approach. We first construct for each ROI a graph representing the interactions between the m covariates and compute an m dimensional vector characterizing the steady state distribution among features. We then use all these m-dimensional vectors to construct a graph between the ROIs from all assays. This second graph is subjected to a nonlinear dimension reduction analysis, retrieving the intrinsic geometric representation of the ROIs. Such a representation provides the foundation for efficient and accurate organization of the different ROIs that correlates with their phenotypes. Theoretically, we show that when the ROIs have a particular bi-modal distribution, the new representation gives rise to a better distinction between the two modalities compared to the maximum a posteriori (MAP) estimator. We applied our method to predict the sensitivity to PD-1 axis blockers treatment of lung cancer subjects based on IMC data, achieving 92% accuracy. This serves as empirical evidence that the graph of graphs approach enables us to integrate multiple ROIs and the intra-relationships between the features at each ROI, giving rise to an informative representation that is strongly associated with the phenotypic state of the entire image. Importantly, this approach is applicable to other modalities such as spatial transcriptomics.Author summaryWe propose a two-step graph-based analyses for high-dimensional multiplexed datasets characterizing ROIs and their inter-relationships. The first step consists of extracting the steady state distribution of the random walk on the graph, which captures the mutual relations between the covariates of each ROI. The second step employs a nonlinear dimensionality reduction on the steady state distributions to construct a map that unravels the intrinsic geometric structure of the ROIs. We show theoretically that when the ROIs have a two-class structure, our method accentuates the distinction between the classes. Particularly, in a setting with Gaussian distribution it outperforms the MAP estimator, implying that the mutual relations between the covariates and spatial coordinates are well captured by the steady state distributions. We apply our method to imaging mass cytometry (IMC). Our analysis provides a representation that facilitates prediction of the sensitivity to PD-1 axis blockers treatment of lung cancer subjects. Particularly, our approach achieves state of the art results with accuracy of 92%.