DataMeadow: A Visual Canvas for Analysis of Large-Scale Multivariate Data

Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipulation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method.

Download Full-text

Variable selection study using Procrustes analysis

Open Journal of Archaeometry ◽

10.4081/arc.2013.e7 ◽

2013 ◽

Vol 1 (1) ◽

pp. 7 ◽

Cited By ~ 3

Author(s):

Casimiro S. Munita ◽

Lúcia P. Barroso ◽

Paulo M.S. Oliveira

Keyword(s):

Analytical Techniques ◽

Procrustes Analysis ◽

Multidimensional Data ◽

Data Sets ◽

Multivariate Statistical ◽

Data Set ◽

Essential Information ◽

Selection Of Variables ◽

Archaeological Ceramic ◽

Multidimensional Data Sets

Several analytical techniques are often used in archaeometric studies, and when used in combination, these techniques can be used to assess 30 or more elements. Multivariate statistical methods are frequently used to interpret archaeometric data, but their applications can be problematic or difficult to interpret due to the large number of variables. In general, the analyst first measures several variables, many of which may be found to be uninformative, this is naturally very time consuming and expensive. In subsequent studies the analyst may wish to measure fewer variables while attempting to minimize the loss of essential information. Such multidimensional data sets must be closely examined to draw useful information. This paper aims to describe and illustrate a stopping rule for the identification of redundant variables, and the selection of variables subsets, preserving multivariate data structure using Procrustes analysis, selecting those variables that are in some senses adequate for discrimination purposes. We provide an illustrative example of the procedure using a data set of 40 samples in which were determined the concentration of As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, and U obtained via instrumental neutron activation analysis (INAA) on archaeological ceramic samples. The results showed that for this data set, only eight variables (As, Cr, Fe, Hf, La, Nd, Sm, and Th) are required to interpret the data without substantial loss information.

Download Full-text

A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

Information Visualization ◽

10.1177/1473871620922166 ◽

2020 ◽

Vol 19 (4) ◽

pp. 318-338 ◽

Cited By ~ 1

Author(s):

Elio Ventocilla ◽

Maria Riveiro

Keyword(s):

User Study ◽

Quality Measures ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Novice Users ◽

Data Points ◽

Perceived Usability ◽

Multidimensional Data Sets ◽

Cluster Quality

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, [Formula: see text], embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of [Formula: see text] as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with [Formula: see text] values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of [Formula: see text] that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.

Download Full-text

Visualization of Multidimensional Data in Nursing Science

Western Journal of Nursing Research ◽

10.1177/0193945916672448 ◽

2016 ◽

Vol 39 (1) ◽

pp. 112-126 ◽

Cited By ~ 5

Author(s):

Sharron L. Docherty ◽

Allison Vorderstrasse ◽

Debra Brandon ◽

Constance Johnson

Keyword(s):

Visual Analysis ◽

Scientific Visualization ◽

Data Gathering ◽

Population Level ◽

Multidimensional Data ◽

Data Sets ◽

Large Sets ◽

Health And Illness ◽

History Of ◽

Multidimensional Data Sets

Nursing scientists have long been interested in complex, context-dependent questions addressing individual- and population-level challenges in health and illness. These critical questions require multilevel data (e.g., genetic, physiologic, biologic, behavioral, affective, and social). Advances in data-gathering methods have resulted in the collection of large sets of complex, multifaceted, and often non-comparable data. Scientific visualization is a powerful methodological tool for facilitating understanding of these multidimensional data sets. Our purpose is to demonstrate the utility of scientific visualization as a method for identifying associations, patterns, and trends in multidimensional data as exemplified in two studies. We describe a brief history of visual analysis, processes involved in scientific visualization, and opportunities and challenges in the use of visualization methods. Scientific visualization can play a crucial role in helping nurse scientists make sense of the structure and underlying patterns in their data to answer vital questions in the field.

Download Full-text

WEB APPLICATION FOR LARGE-SCALE MULTIDIMENSIONAL DATA VISUALIZATION

Mathematical Modelling and Analysis ◽

10.3846/13926292.2011.580381 ◽

2011 ◽

Vol 16 (1) ◽

pp. 273-285 ◽

Cited By ~ 4

Author(s):

Gintautas Dzemyda ◽

Virginijus Marcinkevičius ◽

Viktor Medvedev

Keyword(s):

Data Mining ◽

Data Visualization ◽

Web Application ◽

Large Scale ◽

Visual Presentation ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Multidimensional Data Visualization ◽

Multidimensional Data Set

In this paper, we present an approach of the web application (as a service) for data mining oriented to the multidimensional data visualization. This paper focuses on visualization methods as a tool for the visual presentation of large-scale multidimensional data sets. The proposed implementation of such a web application obtains a multidimensional data set and as a result produces a visualization of this data set. It also supports different configuration parameters of the data mining methods used. Parallel computation has been used in the proposed implementation to run the algorithms simultaneously on different computers.

Download Full-text

History offset implementation scheme for large scale multidimensional data sets

Proceedings of the 2008 ACM symposium on Applied computing - SAC '08 ◽

10.1145/1363686.1363922 ◽

2008 ◽

Cited By ~ 7

Author(s):

Tatsuo Tsuji ◽

Masayuki Kuroda ◽

Ken Higuchi

Keyword(s):

Large Scale ◽

Multidimensional Data ◽

Data Sets ◽

Implementation Scheme ◽

Multidimensional Data Sets

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Geo-scape, a granularity depended spatialization tool for visualizing multidimensional data sets

Geo-spatial Information Science ◽

10.1007/s11806-010-0385-8 ◽

2010 ◽

Vol 13 (4) ◽

pp. 275-284

Author(s):

Kontaxaki Sofia ◽

Kokla Margarita ◽

Kavouras Marinos

Keyword(s):

Multidimensional Data ◽

Data Sets ◽

Multidimensional Data Sets

Download Full-text

The Midlatitude Continental Convective Clouds Experiment (MC3E) sounding network: operations, processing and analysis

Atmospheric Measurement Techniques ◽

10.5194/amt-8-421-2015 ◽

2015 ◽

Vol 8 (1) ◽

pp. 421-434 ◽

Cited By ~ 18

Author(s):

M. P. Jensen ◽

T. Toto ◽

D. Troyan ◽

P. E. Ciesielski ◽

D. Holdridge ◽

...

Keyword(s):

Large Scale ◽

Scale Model ◽

Data Sets ◽

Central Plains ◽

Data Set ◽

Convective Systems ◽

Convective Clouds ◽

Quality Checks ◽

Network Operations ◽

The Impact

Abstract. The Midlatitude Continental Convective Clouds Experiment (MC3E) took place during the spring of 2011 centered in north-central Oklahoma, USA. The main goal of this field campaign was to capture the dynamical and microphysical characteristics of precipitating convective systems in the US Central Plains. A major component of the campaign was a six-site radiosonde array designed to capture the large-scale variability of the atmospheric state with the intent of deriving model forcing data sets. Over the course of the 46-day MC3E campaign, a total of 1362 radiosondes were launched from the enhanced sonde network. This manuscript provides details on the instrumentation used as part of the sounding array, the data processing activities including quality checks and humidity bias corrections and an analysis of the impacts of bias correction and algorithm assumptions on the determination of convective levels and indices. It is found that corrections for known radiosonde humidity biases and assumptions regarding the characteristics of the surface convective parcel result in significant differences in the derived values of convective levels and indices in many soundings. In addition, the impact of including the humidity corrections and quality controls on the thermodynamic profiles that are used in the derivation of a large-scale model forcing data set are investigated. The results show a significant impact on the derived large-scale vertical velocity field illustrating the importance of addressing these humidity biases.

Download Full-text

A fast methodology for large-scale focusing inversion of gravity and magnetic data using the structured model matrix and the 2-D fast Fourier transform

Geophysical Journal International ◽

10.1093/gji/ggaa372 ◽

2020 ◽

Vol 223 (2) ◽

pp. 1378-1397

Author(s):

Rosemary A Renaut ◽

Jarom D Hogue ◽

Saeed Vatankhah ◽

Shuang Liu

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

Linear Systems ◽

Large Scale ◽

Surface Measurement ◽

Magnetic Data ◽

Uniform Grid ◽

Data Sets ◽

Inversion Algorithm ◽

Data Set

SUMMARY We discuss the focusing inversion of potential field data for the recovery of sparse subsurface structures from surface measurement data on a uniform grid. For the uniform grid, the model sensitivity matrices have a block Toeplitz Toeplitz block structure for each block of columns related to a fixed depth layer of the subsurface. Then, all forward operations with the sensitivity matrix, or its transpose, are performed using the 2-D fast Fourier transform. Simulations are provided to show that the implementation of the focusing inversion algorithm using the fast Fourier transform is efficient, and that the algorithm can be realized on standard desktop computers with sufficient memory for storage of volumes up to size n ≈ 106. The linear systems of equations arising in the focusing inversion algorithm are solved using either Golub–Kahan bidiagonalization or randomized singular value decomposition algorithms. These two algorithms are contrasted for their efficiency when used to solve large-scale problems with respect to the sizes of the projected subspaces adopted for the solutions of the linear systems. The results confirm earlier studies that the randomized algorithms are to be preferred for the inversion of gravity data, and for data sets of size m it is sufficient to use projected spaces of size approximately m/8. For the inversion of magnetic data sets, we show that it is more efficient to use the Golub–Kahan bidiagonalization, and that it is again sufficient to use projected spaces of size approximately m/8. Simulations support the presented conclusions and are verified for the inversion of a magnetic data set obtained over the Wuskwatim Lake region in Manitoba, Canada.

Download Full-text

T-SNE visualization of large-scale neural recordings

10.1101/087395 ◽

2016 ◽

Cited By ~ 5

Author(s):

George Dimitriadis ◽

Joana Neto ◽

Adam R. Kampff

Keyword(s):

Large Scale ◽

New Technologies ◽

Dimensional Space ◽

Clustering Algorithms ◽

Brain Regions ◽

Data Sets ◽

Neural Recordings ◽

Sorting Problem ◽

Feature Spaces ◽

High Degree

AbstractElectrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention [22, 23], but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method [27] as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid [23] and paired juxtacellular/extracellular recordings [15]. We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curration of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

Download Full-text