scholarly journals A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

2020 ◽  
Vol 19 (4) ◽  
pp. 318-338 ◽  
Author(s):  
Elio Ventocilla ◽  
Maria Riveiro

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, [Formula: see text], embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of [Formula: see text] as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with [Formula: see text] values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of [Formula: see text] that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.

2013 ◽  
Vol 1 (1) ◽  
pp. 7 ◽  
Author(s):  
Casimiro S. Munita ◽  
Lúcia P. Barroso ◽  
Paulo M.S. Oliveira

Several analytical techniques are often used in archaeometric studies, and when used in combination, these techniques can be used to assess 30 or more elements. Multivariate statistical methods are frequently used to interpret archaeometric data, but their applications can be problematic or difficult to interpret due to the large number of variables. In general, the analyst first measures several variables, many of which may be found to be uninformative, this is naturally very time consuming and expensive. In subsequent studies the analyst may wish to measure fewer variables while attempting to minimize the loss of essential information. Such multidimensional data sets must be closely examined to draw useful information. This paper aims to describe and illustrate a stopping rule for the identification of redundant variables, and the selection of variables subsets, preserving multivariate data structure using Procrustes analysis, selecting those variables that are in some senses adequate for discrimination purposes. We provide an illustrative example of the procedure using a data set of 40 samples in which were determined the concentration of As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, and U obtained via instrumental neutron activation analysis (INAA) on archaeological ceramic samples. The results showed that for this data set, only eight variables (As, Cr, Fe, Hf, La, Nd, Sm, and Th) are required to interpret the data without substantial loss information.


2002 ◽  
Vol 1 (3-4) ◽  
pp. 194-210 ◽  
Author(s):  
Matthew O Ward

Glyphs are graphical entities that convey one or more data values via attributes such as shape, size, color, and position. They have been widely used in the visualization of data and information, and are especially well suited for displaying complex, multivariate data sets. The placement or layout of glyphs on a display can communicate significant information regarding the data values themselves as well as relationships between data points, and a wide assortment of placement strategies have been developed to date. Methods range from simply using data dimensions as positional attributes to basing placement on implicit or explicit structure within the data set. This paper presents an overview of multivariate glyphs, a list of issues regarding the layout of glyphs, and a comprehensive taxonomy of placement strategies to assist the visualization designer in selecting the technique most suitable to his or her data and task. Examples, strengths, weaknesses, and design considerations are given for each category of technique. We conclude with some general guidelines for selecting a placement strategy, along with a brief description of some of our future research directions.


2008 ◽  
Vol 7 (1) ◽  
pp. 18-33 ◽  
Author(s):  
Niklas Elmqvist ◽  
John Stasko ◽  
Philippas Tsigas

Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipulation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method.


Author(s):  
Александр Бондарев ◽  
Aleksandr Bondarev ◽  
Владимир Галактионов ◽  
Vladimir Galaktionov

The paper considers the tasks of visual analysis of multidimensional data sets of medical origin. For visual analysis, the approach of building elastic maps is used. The elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. To improve the results, a number of previously developed procedures are used - preliminary data filtering, removal of separated clusters (flotation). To solve the scalability problem, when the elastic map is adjusted both to the region of condensation of data points and to separately located points of the data cloud, the quasi-Zoom approach is applied. The illustrations of applying elastic maps to various sets of medical data are presented.


Author(s):  
A. E. Bondarev

<p><strong>Abstract.</strong> The paper is devoted to problems of visual analysis of multidimensional data sets using an approach based on the construction of elastic maps. This approach is quite suitable for processing and visualizing of multidimensional datasets. The elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of dataset in question are projected to the map. The extension of designed map to a flat plane allows one to get an insight about the structure of multidimensional dataset. The paper presents the results of applying elastic maps for visual analysis of multidimensional data sets of medical origin. Previously developed data processing procedures are applied to improve the results obtained - pre-filtering of data, removal of separated clusters (flotation), quasi-Zoom.</p>


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Fractals ◽  
2001 ◽  
Vol 09 (01) ◽  
pp. 105-128 ◽  
Author(s):  
TAYFUN BABADAGLI ◽  
KAYHAN DEVELI

This paper presents an evaluation of the methods applied to calculate the fractal dimension of fracture surfaces. Variogram (applicable to 1D self-affine sets) and power spectral density analyses (applicable to 2D self-affine sets) are selected to calculate the fractal dimension of synthetic 2D data sets generated using fractional Brownian motion (fBm). Then, the calculated values are compared with the actual fractal dimensions assigned in the generation of the synthetic surfaces. The main factor considered is the size of the 2D data set (number of data points). The critical sample size that yields the best agreement between the calculated and actual values is defined for each method. Limitations and the proper use of each method are clarified after an extensive analysis. The two methods are also applied to synthetically and naturally developed fracture surfaces of different types of rocks. The methods yield inconsistent fractal dimensions for natural fracture surfaces and the reasons of this are discussed. The anisotropic feature of fractal dimension that may lead to a correlation of fracturing mechanism and multifractality of the fracture surfaces is also addressed.


Sign in / Sign up

Export Citation Format

Share Document