Significance tests for unsupervised pattern discovery in large continuous multivariate data sets

2004 ◽  
Vol 46 (1) ◽  
pp. 57-79 ◽  
Author(s):  
Richard J Bolton ◽  
David J Hand ◽  
Martin Crowder
2019 ◽  
Vol 19 (1) ◽  
pp. 3-23
Author(s):  
Aurea Soriano-Vargas ◽  
Bernd Hamann ◽  
Maria Cristina F de Oliveira

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.


Author(s):  
Gudmund Kleiven

The Empirical Orthogonal Functions (EOF) technique has widely being used by oceanographers and meteorologists, while the Singular Value Decomposition (SVD being a related technique is frequently used in the statistics community. Another related technique called Principal Component Analysis (PCA) is observed being used for instance in pattern recognition. The predominant applications of these techniques are data compression of multivariate data sets which also facilitates subsequent statistical analysis of such data sets. Within Ocean Engineering the EOF technique is not yet widely in use, although there are several areas where multivariate data sets occur and where the EOF technique could represent a supplementary analysis technique. Examples are oceanographic data, in particular current data. Furthermore data sets of model- or full-scale data of loads and responses of slender bodies, such as pipelines and risers are relevant examples. One attractive property of the EOF technique is that it does not require any a priori information on the physical system by which the data is generated. In the present paper a description of the EOF technique is given. Thereafter an example on use of the EOF technique is presented. The example is analysis of response data from a model test of a pipeline in a long free span exposed to current. The model test program was carried out in order to identify the occurrence of multi-mode vibrations and vibration mode amplitudes. In the present example the EOF technique demonstrates the capability of identifying predominant vibration modes of inline as well as cross-flow vibrations. Vibration mode shapes together with mode amplitudes and frequencies are also estimated. Although the present example is not sufficient for concluding on the applicability of the EOF technique on a general basis, the results of the present example demonstrate some of the potential of the technique.


2013 ◽  
Vol 19 (12) ◽  
pp. 2683-2692 ◽  
Author(s):  
Ayan Biswas ◽  
Soumya Dutta ◽  
Han-Wei Shen ◽  
Jonathan Woodring
Keyword(s):  

2003 ◽  
pp. 201-212 ◽  
Author(s):  
Jing Yang ◽  
Matthew O. Ward ◽  
Elke A. Rundensteiner
Keyword(s):  

2011 ◽  
Vol 15 (1) ◽  
pp. 69-88 ◽  
Author(s):  
Annalisa Appice ◽  
Michelangelo Ceci ◽  
Antonio Turi ◽  
Donato Malerba

Sensors ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 166 ◽  
Author(s):  
Rahim Khan ◽  
Ihsan Ali ◽  
Saleh M. Altowaijri ◽  
Muhammad Zakarya ◽  
Atiq Ur Rahman ◽  
...  

Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.


2020 ◽  
Vol 34 (04) ◽  
pp. 6786-6794
Author(s):  
Lifeng Zhang

Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R2) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses.


2019 ◽  
Author(s):  
Paula Breitling ◽  
Alexandros Stamatakis ◽  
Olga Chernomor ◽  
Ben Bettisworth ◽  
Lukasz Reszczynski

AbstractTerraces in phylogenetic tree space are, among other things, important for the design of tree space search strategies. While the phenomenon of phylogenetic terraces is already known for unlinked partition models on partitioned phylogenomic data sets, it has not yet been studied if an analogous structure is present under linked and scaled partition models. To this end, we analyze aspects such as the log-likelihood distributions, likelihood-based significance tests, and nearest neighborhood interchanges on the trees residing on a terrace and compare their distributions among unlinked, linked, and scaled partition models. Our study shows that there exists a terrace-like structure under linked and scaled partition models as well. We denote this phenomenon as quasi-terrace. Therefore quasi-terraces should be taken into account in the design of tree search algorithms as well as when reporting results on ‘the’ final tree topology in empirical phylogenetic studies.


Sign in / Sign up

Export Citation Format

Share Document