Significance tests for unsupervised pattern discovery in large continuous multivariate data sets

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.

Download Full-text

Identifying VIV Vibration Modes by Use of the Empirical Orthogonal Functions Technique

21st International Conference on Offshore Mechanics and Arctic Engineering, Volume 1 ◽

10.1115/omae2002-28425 ◽

2002 ◽

Cited By ~ 3

Author(s):

Gudmund Kleiven

Keyword(s):

Model Test ◽

Vibration Mode ◽

Multivariate Data ◽

Empirical Orthogonal Functions ◽

Mode Shapes ◽

Data Sets ◽

Vibration Modes ◽

Ocean Engineering ◽

Orthogonal Functions ◽

Related Technique

The Empirical Orthogonal Functions (EOF) technique has widely being used by oceanographers and meteorologists, while the Singular Value Decomposition (SVD being a related technique is frequently used in the statistics community. Another related technique called Principal Component Analysis (PCA) is observed being used for instance in pattern recognition. The predominant applications of these techniques are data compression of multivariate data sets which also facilitates subsequent statistical analysis of such data sets. Within Ocean Engineering the EOF technique is not yet widely in use, although there are several areas where multivariate data sets occur and where the EOF technique could represent a supplementary analysis technique. Examples are oceanographic data, in particular current data. Furthermore data sets of model- or full-scale data of loads and responses of slender bodies, such as pipelines and risers are relevant examples. One attractive property of the EOF technique is that it does not require any a priori information on the physical system by which the data is generated. In the present paper a description of the EOF technique is given. Thereafter an example on use of the EOF technique is presented. The example is analysis of response data from a model test of a pipeline in a long free span exposed to current. The model test program was carried out in order to identify the occurrence of multi-mode vibrations and vibration mode amplitudes. In the present example the EOF technique demonstrates the capability of identifying predominant vibration modes of inline as well as cross-flow vibrations. Vibration mode shapes together with mode amplitudes and frequencies are also estimated. Although the present example is not sufficient for concluding on the applicability of the EOF technique on a general basis, the results of the present example demonstrate some of the potential of the technique.

Download Full-text

An Information-Aware Framework for Exploring Multivariate Data Sets

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2013.133 ◽

2013 ◽

Vol 19 (12) ◽

pp. 2683-2692 ◽

Cited By ~ 36

Author(s):

Ayan Biswas ◽

Soumya Dutta ◽

Han-Wei Shen ◽

Jonathan Woodring

Keyword(s):

Multivariate Data ◽

Data Sets

Download Full-text

Hierarchical Exploration of Large Multivariate Data Sets

Data Visualization ◽

10.1007/978-1-4615-1177-9_14 ◽

2003 ◽

pp. 201-212 ◽

Cited By ~ 2

Author(s):

Jing Yang ◽

Matthew O. Ward ◽

Elke A. Rundensteiner

Keyword(s):

Multivariate Data ◽

Data Sets

Download Full-text

A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Intelligent Data Analysis ◽

10.3233/ida-2010-0456 ◽

2011 ◽

Vol 15 (1) ◽

pp. 69-88 ◽

Cited By ~ 15

Author(s):

Annalisa Appice ◽

Michelangelo Ceci ◽

Antonio Turi ◽

Donato Malerba

Keyword(s):

Distributed Algorithm ◽

Pattern Discovery ◽

Large Data ◽

Large Data Sets ◽

Frequent Pattern ◽

Data Sets

Download Full-text

Algorithms for the Visualization of Large and Multivariate Data Sets

Self-Organizing Neural Networks - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-7908-1810-9_8 ◽

2002 ◽

pp. 165-183 ◽

Cited By ~ 1

Author(s):

Friedhelm Schwenker ◽

Hans A. Kestler ◽

Günther Palm

Keyword(s):

Multivariate Data ◽

Data Sets

Download Full-text

LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data

Sensors ◽

10.3390/s19010166 ◽

2019 ◽

Vol 19 (1) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Rahim Khan ◽

Ihsan Ali ◽

Saleh M. Altowaijri ◽

Muhammad Zakarya ◽

Atiq Ur Rahman ◽

...

Keyword(s):

Dynamic Programming ◽

Dna Analysis ◽

Multivariate Data ◽

Longest Common Subsequence ◽

Sensor Data ◽

Computational Time ◽

Data Sets ◽

Data Set ◽

Classical Dynamic ◽

Engineering Sciences

Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.

Download Full-text

Modelling phase shifts, peak shifts and peak width variations in spectral data sets: its value in multivariate data analysis

Analytica Chimica Acta ◽

10.1016/s0003-2670(00)01349-0 ◽

2001 ◽

Vol 432 (1) ◽

pp. 113-124 ◽

Cited By ~ 11

Author(s):

H. Witjes ◽

M. Pepers ◽

W.J. Melssen ◽

L.M.C. Buydens

Keyword(s):

Data Analysis ◽

Spectral Data ◽

Multivariate Data Analysis ◽

Multivariate Data ◽

Phase Shifts ◽

Data Sets ◽

Peak Width

Download Full-text

Systematically Exploring Associations among Multivariate Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6158 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6786-6794

Author(s):

Lifeng Zhang

Keyword(s):

Interaction Effect ◽

Functional Relationship ◽

Multivariate Data ◽

Coefficient Of Determination ◽

High Dimensional ◽

Data Sets ◽

Statistical Tool ◽

Wide Range ◽

Main Effect ◽

Data Points

Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R2) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses.

Download Full-text

Empirical Analysis of Phylogenetic Quasi-Terraces

10.1101/810309 ◽

2019 ◽

Author(s):

Paula Breitling ◽

Alexandros Stamatakis ◽

Olga Chernomor ◽

Ben Bettisworth ◽

Lukasz Reszczynski

Keyword(s):

Phylogenetic Tree ◽

Search Algorithms ◽

Data Sets ◽

Tree Search ◽

Significance Tests ◽

Analogous Structure ◽

Phylogenetic Studies ◽

Log Likelihood ◽

Tree Space ◽

Nearest Neighborhood

AbstractTerraces in phylogenetic tree space are, among other things, important for the design of tree space search strategies. While the phenomenon of phylogenetic terraces is already known for unlinked partition models on partitioned phylogenomic data sets, it has not yet been studied if an analogous structure is present under linked and scaled partition models. To this end, we analyze aspects such as the log-likelihood distributions, likelihood-based significance tests, and nearest neighborhood interchanges on the trees residing on a terrace and compare their distributions among unlinked, linked, and scaled partition models. Our study shows that there exists a terrace-like structure under linked and scaled partition models as well. We denote this phenomenon as quasi-terrace. Therefore quasi-terraces should be taken into account in the design of tree search algorithms as well as when reporting results on ‘the’ final tree topology in empirical phylogenetic studies.

Download Full-text