Treemap-Based Cluster Visualization and its Application to Text Data Analysis

This paper proposes Treemap-based visualization for supporting cluster analysis of multi-dimensional data. It is important to grasp data distribution in a target dataset for such tasks as machine learning and cluster analysis. When dealing with multi-dimensional data such as statistical data and document datasets, dimensionality reduction algorithms are usually applied to project original data to lower-dimensional space. However, dimensionality reduction tends to lose the characteristics of data in the original space. In particular, the border between different data groups could not be represented correctly in lower-dimensional space. To overcome this problem, the proposed visualization method applies Fuzzy c-Means to target data and visualizes the result on the basis of the highest and the second-highest membership values with Treemap. Visualizing the information about not only the closest clusters but also the second closest ones is expected to be useful for identifying objects around the border between different clusters, as well as for understanding the relationship between different clusters. A prototype interface is implemented, of which the effectiveness is investigated with a user experiment on a news articles dataset. As another kind of text data, a case study of applying it to a word embedding space is also shown.

Download Full-text

Optimization of the multidimensional signal interpolator in a lower dimensional space

Computer Optics ◽

10.18287/2412-6179-2019-43-4-653-660 ◽

2019 ◽

Vol 43 (4) ◽

pp. 653-660 ◽

Cited By ~ 4

Author(s):

M.V. Gashnikov

Keyword(s):

Experimental Study ◽

Dimensional Space ◽

Three Dimensional ◽

Two Dimensional ◽

Multidimensional Signals ◽

Multidimensional Signal ◽

Signal Sample ◽

Lower Dimensional Space ◽

Lower Dimensional ◽

Selection Of

Adaptive multidimensional signal interpolators are developed. These interpolators take into account the presence and direction of boundaries of flat signal regions in each local neighborhood based on the automatic selection of the interpolating function for each signal sample. The selection of the interpolating function is performed by a parameterized rule, which is optimized in a parametric lower dimensional space. The dimension reduction is performed using rank filtering of local differences in the neighborhood of each signal sample. The interpolating functions of adaptive interpolators are written for the multidimensional, three-dimensional and two-dimensional cases. The use of adaptive interpolators in the problem of compression of multidimensional signals is also considered. Results of an experimental study of adaptive interpolators for real multidimensional signals of various types are presented.

Download Full-text

GENETIC ALGORITHMS FOR MULTIDIMENSIONAL SCALING / GENETINIŲ ALGORITMŲ TAIKYMAS DAUGIAMATĖMS SKALĖMS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2015.781 ◽

2015 ◽

Vol 7 (3) ◽

pp. 275-279 ◽

Cited By ~ 2

Author(s):

Agnė Dzidolikaitė

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Multidimensional Scaling ◽

Optimization Problem ◽

Dimensional Space ◽

Local Solution ◽

Multidimensional Data ◽

Global Optimization Problem ◽

Lower Dimensional Space ◽

Lower Dimensional

The paper analyzes global optimization problem. In order to solve this problem multidimensional scaling algorithm is combined with genetic algorithm. Using multidimensional scaling we search for multidimensional data projections in a lower-dimensional space and try to keep dissimilarities of the set that we analyze. Using genetic algorithms we can get more than one local solution, but the whole population of optimal points. Different optimal points give different images. Looking at several multidimensional data images an expert can notice some qualities of given multidimensional data. In the paper genetic algorithm is applied for multidimensional scaling and glass data is visualized, and certain qualities are noticed. Analizuojamas globaliojo optimizavimo uždavinys. Jis apibrėžiamas kaip netiesinės tolydžiųjų kintamųjų tikslo funkcijos optimizavimas leistinojoje srityje. Optimizuojant taikomi įvairūs algoritmai. Paprastai taikant tikslius algoritmus randamas tikslus sprendinys, tačiau tai gali trukti labai ilgai. Dažnai norima gauti gerą sprendinį per priimtiną laiko tarpą. Tokiu atveju galimi kiti – euristiniai, algoritmai, kitaip dar vadinami euristikomis. Viena iš euristikų yra genetiniai algoritmai, kopijuojantys gyvojoje gamtoje vykstančią evoliuciją. Sudarant algoritmus naudojami evoliuciniai operatoriai: paveldimumas, mutacija, selekcija ir rekombinacija. Taikant genetinius algoritmus galima rasti pakankamai gerus sprendinius tų uždavinių, kuriems nėra tikslių algoritmų. Genetiniai algoritmai taip pat taikytini vizualizuojant duomenis daugiamačių skalių metodu. Taikant daugiamates skales ieškoma daugiamačių duomenų projekcijų mažesnio skaičiaus matmenų erdvėje siekiant išsaugoti analizuojamos aibės panašumus arba skirtingumus. Taikant genetinius algoritmus gaunamas ne vienas lokalusis sprendinys, o visa optimumų populiacija. Skirtingi optimumai atitinka skirtingus vaizdus. Matydamas kelis daugiamačių duomenų variantus, ekspertas gali įžvelgti daugiau daugiamačių duomenų savybių. Straipsnyje genetinis algoritmas pritaikytas daugiamatėms skalėms. Parodoma, kad daugiamačių skalių algoritmą galima kombinuoti su genetiniu algoritmu ir panaudoti daugiamačiams duomenims vizualizuoti.

Download Full-text

Interface-targeted seismic velocity estimation using machine learning

Geophysical Journal International ◽

10.1093/gji/ggz142 ◽

2019 ◽

Vol 218 (1) ◽

pp. 45-56 ◽

Cited By ~ 1

Author(s):

C Nur Schuba ◽

Jonathan P Schuba ◽

Gary G Gray ◽

Richard G Davy

Keyword(s):

Neural Network ◽

Regression Model ◽

Dimensional Space ◽

Seismic Profile ◽

Velocity Estimation ◽

Seismic Survey ◽

Wide Angle ◽

Seismic Velocities ◽

Lower Dimensional Space ◽

Lower Dimensional

SUMMARY We present a new approach to estimate 3-D seismic velocities along a target interface. This approach uses an artificial neural network trained with user-supplied geological and geophysical input features derived from both a 3-D seismic reflection volume and a 2-D wide-angle seismic profile that were acquired from the Galicia margin, offshore Spain. The S-reflector detachment fault was selected as the interface of interest. The neural network in the form of a multilayer perceptron was employed with an autoencoder and a regression layer. The autoencoder was trained using a set of input features from the 3-D reflection volume. This set of features included the reflection amplitude and instantaneous frequency at the interface of interest, time-thicknesses of overlying major layers and ratios of major layer time-thicknesses to the total time-depth of the interface. The regression model was trained to estimate the seismic velocities of the crystalline basement and mantle from these features. The ‘true’ velocities were obtained from an independent full-waveform inversion along a 2-D wide-angle seismic profile, contained within the 3-D data set. The autoencoder compressed the vector of inputs into a lower dimensional space, then the regression layer was trained in the lower dimensional space to estimate velocities above and below the targeted interface. This model was trained on 50 networks with different initializations. A total of 37 networks reached minimum achievable error of 2 per cent. The low standard deviation (<300 m s−1) between different networks and low errors on velocity estimations demonstrate that the input features were sufficient to capture variations in the velocity above and below the targeted S-reflector. This regression model was then applied to the 3-D reflection volume where velocities were predicted over an area of ∼400 km2. This approach provides an alternative way to obtain velocities across a 3-D seismic survey from a deep non-reflective lithology (e.g. upper mantle) , where conventional reflection velocity estimations can be unreliable.

Download Full-text

Complex Moment-Based Supervised Eigenmap for Dimensionality Reduction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013910 ◽

2019 ◽

Vol 33 ◽

pp. 3910-3918 ◽

Cited By ~ 1

Author(s):

Akira Imakura ◽

Momo Matsuda ◽

Xiucai Ye ◽

Tetsuya Sakurai

Keyword(s):

Dimensionality Reduction ◽

Parallel Implementation ◽

Dimensional Space ◽

Recognition Performance ◽

Optimization Methods ◽

Original Data ◽

Dimensional Subspace ◽

Reduction Methods ◽

Low Dimensional ◽

Matrix Trace

Dimensionality reduction methods that project highdimensional data to a low-dimensional space by matrix trace optimization are widely used for clustering and classification. The matrix trace optimization problem leads to an eigenvalue problem for a low-dimensional subspace construction, preserving certain properties of the original data. However, most of the existing methods use only a few eigenvectors to construct the low-dimensional space, which may lead to a loss of useful information for achieving successful classification. Herein, to overcome the deficiency of the information loss, we propose a novel complex moment-based supervised eigenmap including multiple eigenvectors for dimensionality reduction. Furthermore, the proposed method provides a general formulation for matrix trace optimization methods to incorporate with ridge regression, which models the linear dependency between covariate variables and univariate labels. To reduce the computational complexity, we also propose an efficient and parallel implementation of the proposed method. Numerical experiments indicate that the proposed method is competitive compared with the existing dimensionality reduction methods for the recognition performance. Additionally, the proposed method exhibits high parallel efficiency.

Download Full-text

Explaining three-dimensional dimensionality reduction plots

Information Visualization ◽

10.1177/1473871615600010 ◽

2015 ◽

Vol 15 (2) ◽

pp. 154-172 ◽

Cited By ~ 11

Author(s):

Danilo B Coimbra ◽

Rafael M Martins ◽

Tácito TAT Neves ◽

Alexandru C Telea ◽

Fernando V Paulovich

Keyword(s):

Dimensionality Reduction ◽

Dimensional Space ◽

Three Dimensional ◽

Original Data ◽

Reduction Technique ◽

High Dimensional ◽

Dimensionality Reduction Technique ◽

Visualization Techniques ◽

High Dimensional Datasets ◽

Three Dimensional Space

Understanding three-dimensional projections created by dimensionality reduction from high-variate datasets is very challenging. In particular, classical three-dimensional scatterplots used to display such projections do not explicitly show the relations between the projected points, the viewpoint used to visualize the projection, and the original data variables. To explore and explain such relations, we propose a set of interactive visualization techniques. First, we adapt and enhance biplots to show the data variables in the projected three-dimensional space. Next, we use a set of interactive bar chart legends to show variables that are visible from a given viewpoint and also assist users to select an optimal viewpoint to examine a desired set of variables. Finally, we propose an interactive viewpoint legend that provides an overview of the information visible in a given three-dimensional projection from all possible viewpoints. Our techniques are simple to implement and can be applied to any dimensionality reduction technique. We demonstrate our techniques on the exploration of several real-world high-dimensional datasets.

Download Full-text

Visualizing a multi-dimensional data set in a lower dimensional space

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) ◽

10.1109/icadiwt.2008.4664363 ◽

2008 ◽

Author(s):

Dong-Hun Seo ◽

Won Don Lee

Keyword(s):

Dimensional Space ◽

Data Set ◽

Lower Dimensional Space ◽

Lower Dimensional

Download Full-text

The Accuracy of Fuzzy C-Means in Lower-Dimensional Space for Topic Detection

Lecture Notes in Computer Science - Smart Computing and Communication ◽

10.1007/978-3-030-05755-8_32 ◽

2018 ◽

pp. 321-334 ◽

Cited By ~ 2

Author(s):

Hendri Murfi

Keyword(s):

Dimensional Space ◽

Topic Detection ◽

Fuzzy C Means ◽

Lower Dimensional Space ◽

Lower Dimensional

Download Full-text

Binary Linear Compression for Multi-label Classification

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/496 ◽

2017 ◽

Cited By ~ 5

Author(s):

Wen-Ji Zhou ◽

Yang Yu ◽

Min-Ling Zhang

Keyword(s):

Dimensional Space ◽

Classification Problem ◽

Complex Model ◽

Linear Mapping ◽

Regression Problem ◽

Low Dimensional ◽

The Relationship ◽

Lower Dimensional Space ◽

Lower Dimensional ◽

Linear Compression

In multi-label classification tasks, labels are commonly related with each other. It has been well recognized that utilizing label relationship is essential to multi-label learning. One way to utilizing label relationship is to map labels to a lower-dimensional space of uncorrelated labels, where the relationship could be encoded in the mapping. Previous linear mapping methods commonly result in regression subproblems in the lower-dimensional label space. In this paper, we disclose that mappings to a low-dimensional multi-label regression problem can be worse than mapping to a classification problem, since regression requires more complex model than classification. We then propose the binary linear compression (BILC) method that results in a binary label space, leading to classification subproblems. Experiments on several multi-label datasets show that, employing classification in the embedded space results in much simpler models than regression, leading to smaller structure risk. The proposed methods are also shown to be superior to some state-of-the-art approaches.

Download Full-text

TESTING THE RELATIONAL PERSPECTIVE MAP FOR VISUALIZATION OF MULTIDIMENSIONAL DATA

Technological and Economic Development of Economy ◽

10.3846/13928619.2006.9637756 ◽

2006 ◽

Vol 12 (4) ◽

pp. 289-294 ◽

Cited By ~ 1

Author(s):

Rasa Karbauskaitė ◽

Virginijus Marcinkevičius ◽

Gintautas Dzemyda

Keyword(s):

Dimensional Space ◽

Multidimensional Data ◽

Two Dimensional ◽

Relational Perspective ◽

Lower Dimensional Space ◽

Lower Dimensional ◽

Better Than ◽

Closed Plane

This paper deals with a method, called the relational perspective map that visualizes multidimensional data onto two‐dimensional closed plane. It tries to preserve the distances between the multidimensional data in the lower‐dimensional space. But the most important feature of the relational perspective map is the ability to visualize data in a non‐overlapping manner so that it reveals small distances better than other known visualization methods. In this paper, the features of this method are explored experimentally and some disadvantages are noticed. We have proposed a modification of this method, which enables us to avoid them.

Download Full-text

Deep Clustering with Self-supervision using Pairwise Data Similarities

10.36227/techrxiv.14852652.v2 ◽

2021 ◽

Author(s):

Mohammadreza Sadeghi ◽

Narges Armanfard

Keyword(s):

Dimensional Space ◽

Second Phase ◽

Similar Data ◽

Number Of Clusters ◽

Latent Space ◽

Benchmark Datasets ◽

Data Points ◽

Complex Cluster ◽

Lower Dimensional Space ◽

Lower Dimensional

<div>Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. In this paper we propose a novel deep clustering framework with self-supervision using pairwise data similarities (DCSS). The proposed method consists of two successive phases. In the first phase we propose to form hypersphere-like groups of similar data points, i.e. one hypersphere per cluster, employing an autoencoder which is trained using cluster-specific losses. The hyper-spheres are formed in the autoencoder’s latent space. In the second phase, we propose to employ pairwise data similarities to create a K-dimensional space that is capable of accommodating more complex cluster distributions; hence, providing more accurate clustering performance. K is the number of clusters. The autoencoder’s latent space obtained in the first phase is used as the input of the second phase. Effectiveness of both phases are demonstrated on seven benchmark datasets through conducting a rigorous set of experiments.</div>

Download Full-text