Dimensionality Reduction for Registration of High-Dimensional Data Sets

Due to the intricate relationship between different dimensions of high-dimensional data, subspace analysis is often conducted to decompose dimensions and give prominence to certain subsets of dimensions, i.e. subspaces. Exploring and comparing subspaces are important to reveal the underlying features of subspaces, as well as to portray the characteristics of individual dimensions. To date, most of the existing high-dimensional data exploration and analysis approaches rely on dimensionality reduction algorithms (e.g. principal component analysis and multi-dimensional scaling) to project high-dimensional data, or their subspaces, to two-dimensional space and employ scatterplots for visualization. However, the dimensionality reduction algorithms are sometimes difficult to fine-tune and scatterplots are not effective for comparative visualization, making subspace comparison hard to perform. In this article, we aggregate high-dimensional data or their subspaces by computing pair-wise distances between all data items and showing the distances with matrix visualizations to present the original high-dimensional data or subspaces. Our approach enables effective visual comparisons among subspaces, which allows users to further investigate the characteristics of individual dimensions by studying their behaviors in similar subspaces. Through subspace comparisons, we identify dominant, similar, and conforming dimensions in different subspace contexts of synthetic and real-world high-dimensional data sets. Additionally, we present a prototype that integrates parallel coordinates plot and matrix visualization for high-dimensional data exploration and incremental dimensionality analysis, which also allows users to further validate the dimension characterization results derived from the subspace comparisons.

Download Full-text

Quality-based guidance for exploratory dimensionality reduction

Information Visualization ◽

10.1177/1473871612460526 ◽

2012 ◽

Vol 12 (1) ◽

pp. 44-64 ◽

Cited By ~ 12

Author(s):

Sara Johansson Fernstad ◽

Jane Shaw ◽

Jimmy Johansson

Keyword(s):

Dimensionality Reduction ◽

High Dimensional Data ◽

Quality Metrics ◽

Exploratory Analysis ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Bacterial Populations ◽

Reduction Methods ◽

Individual Variables

High-dimensional data sets containing hundreds of variables are difficult to explore, as traditional visualization methods often are unable to represent such data effectively. This is commonly addressed by employing dimensionality reduction prior to visualization. Numerous dimensionality reduction methods are available. However, few reduction approaches take the importance of several structures into account and few provide an overview of structures existing in the full high-dimensional data set. For exploratory analysis, as well as for many other tasks, several structures may be of interest. Exploration of the full high-dimensional data set without reduction may also be desirable. This paper presents flexible methods for exploratory analysis and interactive dimensionality reduction. Automated methods are employed to analyse the variables, using a range of quality metrics, providing one or more measures of ‘interestingness’ for individual variables. Through ranking, a single value of interestingness is obtained, based on several quality metrics, that is usable as a threshold for the most interesting variables. An interactive environment is presented in which the user is provided with many possibilities to explore and gain understanding of the high-dimensional data set. Guided by this, the analyst can explore the high-dimensional data set and interactively select a subset of the potentially most interesting variables, employing various methods for dimensionality reduction. The system is demonstrated through a use-case analysing data from a DNA sequence-based study of bacterial populations.

Download Full-text

An Advanced Mining Services in Predicting and Ranking User Vitality across Dynamic and High Dimensional Data Sets

SSRN Electronic Journal ◽

10.2139/ssrn.3395242 ◽

2019 ◽

Author(s):

Ch. Durga Bhavani ◽

Dr. A. Daveedu Raju ◽

Dr. V. Surya Narayana

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Data Sets

Download Full-text

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Scientific Programming ◽

10.1155/2015/180214 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Sai Kiranmayee Samudrala ◽

Jaroslaw Zola ◽

Srinivas Aluru ◽

Baskar Ganapathysubramanian

Keyword(s):

Dimensionality Reduction ◽

Organic Solar Cells ◽

Large Scale ◽

Parallel Implementation ◽

High Dimensional Data ◽

Real Life ◽

Processing Parameters ◽

High Dimensional ◽

Morphology Evolution ◽

Reduction Techniques

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Download Full-text

Multi-Instance Dimensionality Reduction via Sparsity and Orthogonality

Neural Computation ◽

10.1162/neco_a_01140 ◽

2018 ◽

Vol 30 (12) ◽

pp. 3281-3308

Author(s):

Hong Zhu ◽

Li-Zhi Liao ◽

Michael K. Ng

Keyword(s):

Dimensionality Reduction ◽

Optimization Problem ◽

Augmented Lagrangian ◽

Main Idea ◽

Real Data ◽

Learning Performance ◽

High Dimensional ◽

Data Sets ◽

Outer Loop ◽

Orthogonality Constraints

We study a multi-instance (MI) learning dimensionality-reduction algorithm through sparsity and orthogonality, which is especially useful for high-dimensional MI data sets. We develop a novel algorithm to handle both sparsity and orthogonality constraints that existing methods do not handle well simultaneously. Our main idea is to formulate an optimization problem where the sparse term appears in the objective function and the orthogonality term is formed as a constraint. The resulting optimization problem can be solved by using approximate augmented Lagrangian iterations as the outer loop and inertial proximal alternating linearized minimization (iPALM) iterations as the inner loop. The main advantage of this method is that both sparsity and orthogonality can be satisfied in the proposed algorithm. We show the global convergence of the proposed iterative algorithm. We also demonstrate that the proposed algorithm can achieve high sparsity and orthogonality requirements, which are very important for dimensionality reduction. Experimental results on both synthetic and real data sets show that the proposed algorithm can obtain learning performance comparable to that of other tested MI learning algorithms.

Download Full-text

Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems ◽

10.3233/his-2012-0149 ◽

2012 ◽

Vol 9 (2) ◽

pp. 91-103 ◽

Cited By ~ 1

Author(s):

Santhosh Pathical ◽

Gursel Serpen

Keyword(s):

High Dimensional Data ◽

Classifier Ensemble ◽

High Dimensional ◽

Data Sets

Download Full-text

Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-030-01851-1_23 ◽

2018 ◽

pp. 236-246

Author(s):

Ilari Kampman ◽

Tapio Elomaa

Keyword(s):

Dimensionality Reduction ◽

Hierarchical Clustering ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

21st International Conference on Data Engineering (ICDE'05) ◽

10.1109/icde.2005.66 ◽

2005 ◽

Cited By ~ 43

Author(s):

M.E. Houle ◽

Jun Sakuma

Keyword(s):

Similarity Search ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Approximate Similarity Search ◽

Approximate Similarity

Download Full-text

Mahalanobis distance informed by clustering

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay011 ◽

2018 ◽

Vol 8 (2) ◽

pp. 377-406

Author(s):

Almog Lahav ◽

Ronen Talmon ◽

Yuval Kluger

Keyword(s):

Mahalanobis Distance ◽

High Dimensional Data ◽

Hidden Variables ◽

Real Data ◽

Risk Groups ◽

High Dimensional ◽

Data Sets ◽

Kaplan Meier ◽

Data Points ◽

Survival Plot

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.

Download Full-text

Unsupervised Text Feature Learning via Deep Variational Auto-encoder

Information Technology And Control ◽

10.5755/j01.itc.49.3.25918 ◽

2020 ◽

Vol 49 (3) ◽

pp. 421-437

Author(s):

Genggeng Liu ◽

Lin Xie ◽

Chi-Hua Chen

Keyword(s):

Dimensionality Reduction ◽

High Dimensional Data ◽

Image Data ◽

Original Data ◽

Feature Representation ◽

High Dimensional ◽

Learning To Learn ◽

Text Feature ◽

Reduction Methods ◽

Low Dimensional

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.

Download Full-text