Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer

1AbstractRecent technological advancements in various domains, such as the biomedical and health, offer a plethora of big data for analysis. Part of this data pool is the experimental studies that record various and several features for each instance. It creates datasets having very high dimensionality with mixed data types, with both numerical and categorical variables. On the other hand, unsupervised learning has shown to be able to assist in high-dimensional data, allowing the discovery of unknown patterns through clustering, visualization, dimensionality reduction, and in some cases, their combination. This work highlights unsupervised learning methodologies for large-scale, high-dimensional data, providing the potential of a unified framework that combines the knowledge retrieved from clustering and visualization. The main purpose is to uncover hidden patterns in a high-dimensional mixed dataset, which we achieve through our application in a complex, real-world dataset. The experimental analysis indicates the existence of notable information exposing the usefulness of the utilized methodological framework for similar high-dimensional and mixed, real-world applications.

Download Full-text

Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings

10.31219/osf.io/ujbrs ◽

2021 ◽

Author(s):

Klaus Eckelt ◽

Andreas Hinterreiter ◽

Patrick Adelberger ◽

Conny Walchshofer ◽

Vaishali Dhanoa ◽

...

Keyword(s):

High Dimensional Data ◽

Visual Exploration ◽

High Dimensional ◽

Data Types ◽

Structural Relationships ◽

Or Groups ◽

Analysis Workflow ◽

Visual Approach ◽

Real World Datasets ◽

Low Dimensional

In this work, we propose an interactive visual approach for the exploration of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups is accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups, thereby creating graphs that represent visual data stories. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.

Download Full-text

Integrated analysis of multiple high-dimensional data sets by joint rank-1 matrix approximations

2015 54th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc.2015.7402818 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ashkan Zeinalzadeh ◽

Tom Wenska ◽

Gordon Okimoto

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Integrated Analysis ◽

Data Sets ◽

Matrix Approximations

Download Full-text

The geometry of clinical labs and wellness states from deeply phenotyped humans

Nature Communications ◽

10.1038/s41467-021-23849-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Anat Zimmer ◽

Yael Korem ◽

Noa Rappaport ◽

Tomasz Wilmanski ◽

Priyanka Baloni ◽

...

Keyword(s):

Longitudinal Data ◽

High Dimensional Data ◽

High Dimensional ◽

Health State ◽

Omics Data ◽

Data Types ◽

Disease Phenotypes ◽

Health And Disease

AbstractLongitudinal multi-omics measurements are highly valuable in studying heterogeneity in health and disease phenotypes. For thousands of people, we have collected longitudinal multi-omics data. To analyze, interpret and visualize this extremely high-dimensional data, we use the Pareto Task Inference (ParTI) method. We find that the clinical labs data fall within a tetrahedron. We then use all other data types to characterize the four archetypes. We find that the tetrahedron comprises three wellness states, defining a wellness triangular plane, and one aberrant health state that captures aspects of commonality in movement away from wellness. We reveal the tradeoffs that shape the data and their hierarchy, and use longitudinal data to observe individual trajectories. We then demonstrate how the movement on the tetrahedron can be used for detecting unexpected trajectories, which might indicate transitions from health to disease and reveal abnormal conditions, even when all individual blood measurements are in the norm.

Download Full-text

Classification of high-dimensional data using the Sparse Matrix Transform

2010 IEEE International Conference on Image Processing ◽

10.1109/icip.2010.5652690 ◽

2010 ◽

Cited By ~ 1

Author(s):

Leonardo R. Bachega ◽

Charles A. Bouman

Keyword(s):

Sparse Matrix ◽

High Dimensional Data ◽

High Dimensional ◽

Sparse Matrix Transform

Download Full-text

stepwiseCM: An R Package for Stepwise Classification of Cancer Samples Using Multiple Heterogeneous Data Sets

Cancer Informatics ◽

10.4137/cin.s13075 ◽

2014 ◽

Vol 13 ◽

pp. CIN.S13075

Author(s):

Askar Obulkasim ◽

Mark A van de Wiel

Keyword(s):

Waiting Times ◽

High Dimensional Data ◽

R Package ◽

Heterogeneous Data ◽

The Other ◽

High Dimensional ◽

Data Sets ◽

Classification Problems ◽

Data Types ◽

Crucial Difference

This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two high-dimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients’ distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.

Download Full-text