Hierarchical Exploration of Large Multivariate Data Sets

TV-MV Analytics: A visual analytics framework to explore time-varying multivariate data

Information Visualization ◽

10.1177/1473871619858937 ◽

2019 ◽

Vol 19 (1) ◽

pp. 3-23

Author(s):

Aurea Soriano-Vargas ◽

Bernd Hamann ◽

Maria Cristina F de Oliveira

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Multivariate Data ◽

Visual Exploration ◽

Data Sets ◽

Time Varying ◽

Domain Experts ◽

Data Mining Algorithms ◽

Temporal Relationships ◽

Visualization Techniques

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.

Download Full-text

Identifying VIV Vibration Modes by Use of the Empirical Orthogonal Functions Technique

21st International Conference on Offshore Mechanics and Arctic Engineering, Volume 1 ◽

10.1115/omae2002-28425 ◽

2002 ◽

Cited By ~ 3

Author(s):

Gudmund Kleiven

Keyword(s):

Model Test ◽

Vibration Mode ◽

Multivariate Data ◽

Empirical Orthogonal Functions ◽

Mode Shapes ◽

Data Sets ◽

Vibration Modes ◽

Ocean Engineering ◽

Orthogonal Functions ◽

Related Technique

The Empirical Orthogonal Functions (EOF) technique has widely being used by oceanographers and meteorologists, while the Singular Value Decomposition (SVD being a related technique is frequently used in the statistics community. Another related technique called Principal Component Analysis (PCA) is observed being used for instance in pattern recognition. The predominant applications of these techniques are data compression of multivariate data sets which also facilitates subsequent statistical analysis of such data sets. Within Ocean Engineering the EOF technique is not yet widely in use, although there are several areas where multivariate data sets occur and where the EOF technique could represent a supplementary analysis technique. Examples are oceanographic data, in particular current data. Furthermore data sets of model- or full-scale data of loads and responses of slender bodies, such as pipelines and risers are relevant examples. One attractive property of the EOF technique is that it does not require any a priori information on the physical system by which the data is generated. In the present paper a description of the EOF technique is given. Thereafter an example on use of the EOF technique is presented. The example is analysis of response data from a model test of a pipeline in a long free span exposed to current. The model test program was carried out in order to identify the occurrence of multi-mode vibrations and vibration mode amplitudes. In the present example the EOF technique demonstrates the capability of identifying predominant vibration modes of inline as well as cross-flow vibrations. Vibration mode shapes together with mode amplitudes and frequencies are also estimated. Although the present example is not sufficient for concluding on the applicability of the EOF technique on a general basis, the results of the present example demonstrate some of the potential of the technique.

Download Full-text

An Information-Aware Framework for Exploring Multivariate Data Sets

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2013.133 ◽

2013 ◽

Vol 19 (12) ◽

pp. 2683-2692 ◽

Cited By ~ 36

Author(s):

Ayan Biswas ◽

Soumya Dutta ◽

Han-Wei Shen ◽

Jonathan Woodring

Keyword(s):

Multivariate Data ◽

Data Sets

Download Full-text

Algorithms for the Visualization of Large and Multivariate Data Sets

Self-Organizing Neural Networks - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-7908-1810-9_8 ◽

2002 ◽

pp. 165-183 ◽

Cited By ~ 1

Author(s):

Friedhelm Schwenker ◽

Hans A. Kestler ◽

Günther Palm

Keyword(s):

Multivariate Data ◽

Data Sets

Download Full-text

LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data

Sensors ◽

10.3390/s19010166 ◽

2019 ◽

Vol 19 (1) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Rahim Khan ◽

Ihsan Ali ◽

Saleh M. Altowaijri ◽

Muhammad Zakarya ◽

Atiq Ur Rahman ◽

...

Keyword(s):

Dynamic Programming ◽

Dna Analysis ◽

Multivariate Data ◽

Longest Common Subsequence ◽

Sensor Data ◽

Computational Time ◽

Data Sets ◽

Data Set ◽

Classical Dynamic ◽

Engineering Sciences

Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.

Download Full-text

Modelling phase shifts, peak shifts and peak width variations in spectral data sets: its value in multivariate data analysis

Analytica Chimica Acta ◽

10.1016/s0003-2670(00)01349-0 ◽

2001 ◽

Vol 432 (1) ◽

pp. 113-124 ◽

Cited By ~ 11

Author(s):

H. Witjes ◽

M. Pepers ◽

W.J. Melssen ◽

L.M.C. Buydens

Keyword(s):

Data Analysis ◽

Spectral Data ◽

Multivariate Data Analysis ◽

Multivariate Data ◽

Phase Shifts ◽

Data Sets ◽

Peak Width

Download Full-text

Significance tests for unsupervised pattern discovery in large continuous multivariate data sets

Computational Statistics & Data Analysis ◽

10.1016/s0167-9473(03)00142-7 ◽

2004 ◽

Vol 46 (1) ◽

pp. 57-79 ◽

Cited By ~ 2

Author(s):

Richard J Bolton ◽

David J Hand ◽

Martin Crowder

Keyword(s):

Pattern Discovery ◽

Multivariate Data ◽

Data Sets ◽

Significance Tests

Download Full-text

Systematically Exploring Associations among Multivariate Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6158 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6786-6794

Author(s):

Lifeng Zhang

Keyword(s):

Interaction Effect ◽

Functional Relationship ◽

Multivariate Data ◽

Coefficient Of Determination ◽

High Dimensional ◽

Data Sets ◽

Statistical Tool ◽

Wide Range ◽

Main Effect ◽

Data Points

Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R2) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses.

Download Full-text

PeaGlyph: Glyph design for investigation of balanced data structures

Information Visualization ◽

10.1177/14738716211050602 ◽

2021 ◽

pp. 147387162110506

Author(s):

Kenan Koc ◽

Andrew Stephen McGough ◽

Sara Johansson Fernstad

Keyword(s):

Data Structures ◽

Group Formation ◽

Multivariate Data ◽

Good Alternative ◽

Initial Study ◽

Data Sets ◽

The Novel ◽

Domain Expertise ◽

Shape Characteristics ◽

Initial Results

For many data analysis tasks, such as the formation of well-balanced groups for a fair race or collaboration in learning settings, the balancing between data attributes is at least as important as the actual values of items. At the same time, comparison of values is implicitly desired for these tasks. Even with statistical methods available to measure the level of balance, human judgment, and domain expertise plays an important role in judging the level of balance, and whether the level of unbalance is acceptable or not. Accordingly, there is a need for techniques that improve decision-making in the context of group formation that can be used as a visual complement to statistical analysis. This paper introduces a novel glyph-based visualization, PeaGlyph, which aims to support the understanding of balanced and unbalanced data structures, for instance by using a frequency format through countable marks and salient shape characteristics. The glyph was designed particularly for tasks of relevance for investigation of properties of balanced and unbalanced groups, such as looking-up and comparing values. Glyph-based visualization methods provide flexible and useful abstractions for exploring and analyzing multivariate data sets. The PeaGlyph design was based on an initial study that compared four glyph visualization methods in a joint study, including two base glyphs and their variations. The performance of the novel PeaGlyph was then compared to the best “performers” of the first study through evaluation. The initial results from the study are encouraging, and the proposed design may be a good alternative to the traditional glyphs for depicting multivariate data and allowing viewers to form an intuitive impression as to how balanced or unbalanced a set of objects are. Furthermore, a set of design considerations is discussed in context of the design of the glyphs.

Download Full-text