scholarly journals Variable selection study using Procrustes analysis

2013 ◽  
Vol 1 (1) ◽  
pp. 7 ◽  
Author(s):  
Casimiro S. Munita ◽  
Lúcia P. Barroso ◽  
Paulo M.S. Oliveira

Several analytical techniques are often used in archaeometric studies, and when used in combination, these techniques can be used to assess 30 or more elements. Multivariate statistical methods are frequently used to interpret archaeometric data, but their applications can be problematic or difficult to interpret due to the large number of variables. In general, the analyst first measures several variables, many of which may be found to be uninformative, this is naturally very time consuming and expensive. In subsequent studies the analyst may wish to measure fewer variables while attempting to minimize the loss of essential information. Such multidimensional data sets must be closely examined to draw useful information. This paper aims to describe and illustrate a stopping rule for the identification of redundant variables, and the selection of variables subsets, preserving multivariate data structure using Procrustes analysis, selecting those variables that are in some senses adequate for discrimination purposes. We provide an illustrative example of the procedure using a data set of 40 samples in which were determined the concentration of As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, and U obtained via instrumental neutron activation analysis (INAA) on archaeological ceramic samples. The results showed that for this data set, only eight variables (As, Cr, Fe, Hf, La, Nd, Sm, and Th) are required to interpret the data without substantial loss information.

2008 ◽  
Vol 7 (1) ◽  
pp. 18-33 ◽  
Author(s):  
Niklas Elmqvist ◽  
John Stasko ◽  
Philippas Tsigas

Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipulation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method.


2020 ◽  
Vol 19 (4) ◽  
pp. 318-338 ◽  
Author(s):  
Elio Ventocilla ◽  
Maria Riveiro

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, [Formula: see text], embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of [Formula: see text] as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with [Formula: see text] values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of [Formula: see text] that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.


2014 ◽  
Vol 46 (3) ◽  
pp. 377-388 ◽  
Author(s):  
Matias Bonansea ◽  
Claudia Ledesma ◽  
Claudia Rodriguez ◽  
Lucio Pinotti

Water quality monitoring programs generate complex multidimensional data sets. In this study, multivariate statistical techniques were employed as an effective tool for the analysis and interpretation of these water quality data sets. Principal component analysis (PCA) and cluster analysis (CA) were applied to evaluate spatial and temporal variation of water quality in Río Tercero Reservoir (Argentina). Six sampling sites were surveyed each climatic season for 21 parameters during 2003–2010. The results revealed that PCA showed the existence of four significant principal components (PCs) which account for 96.7% of the total variance of the data set. The first PC was assigned to mineralization whereas the other PCs were built from variables indicative of pollution. Hierarchical CA grouped the six monitoring sites into three clusters and classified the different climatic seasons into two clusters based on similarities in water quality characteristics.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


1997 ◽  
Vol 3 (S2) ◽  
pp. 931-932 ◽  
Author(s):  
Ian M. Anderson ◽  
Jim Bentley

Recent developments in instrumentation and computing power have greatly improved the potential for quantitative imaging and analysis. For example, products are now commercially available that allow the practical acquisition of spectrum images, where an EELS or EDS spectrum can be acquired from a sequence of positions on the specimen. However, such data files typically contain megabytes of information and may be difficult to manipulate and analyze conveniently or systematically. A number of techniques are being explored for the purpose of analyzing these large data sets. Multivariate statistical analysis (MSA) provides a method for analyzing the raw data set as a whole. The basis of the MSA method has been outlined by Trebbia and Bonnet.MSA has a number of strengths relative to other methods of analysis. First, it is broadly applicable to any series of spectra or images. Applications include characterization of grain boundary segregation (position-), of channeling-enhanced microanalysis (orientation-), or of beam damage (time-variation of spectra).


2019 ◽  
Vol 2 (1) ◽  
pp. 223-251 ◽  
Author(s):  
Francesco Cutrale ◽  
Scott E. Fraser ◽  
Le A. Trinh

Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.


Sign in / Sign up

Export Citation Format

Share Document