Discarding Variables in a Principal Component Analysis. II: Real Data

Author(s):  
I. T. Jolliffe
2015 ◽  
Vol 18 ◽  
Author(s):  
Rubén Daniel Ledesma ◽  
Pedro Valero-Mora ◽  
Guillermo Macbeth

AbstractExploratory Factor Analysis and Principal Component Analysis are two data analysis methods that are commonly used in psychological research. When applying these techniques, it is important to determine how many factors to retain. This decision is sometimes based on a visual inspection of the Scree plot. However, the Scree plot may at times be ambiguous and open to interpretation. This paper aims to explore a number of graphical and computational improvements to the Scree plot in order to make it more valid and informative. These enhancements are based on dynamic and interactive data visualization tools, and range from adding Parallel Analysis results to "linking" the Scree plot with other graphics, such as factor-loadings plots. To illustrate our proposed improvements, we introduce and describe an example based on real data on which a principal component analysis is appropriate. We hope to provide better graphical tools to help researchers determine the number of factors to retain.


2010 ◽  
Vol 3 (5) ◽  
Author(s):  
Mario Bettenbühl ◽  
Claudia Paladini ◽  
Konstantin Mergenthaler ◽  
Reinhold Kliegl ◽  
Ralf Engbert ◽  
...  

During visual fixation on a target, humans perform miniature (or fixational) eye movements consisting of three components, i.e., tremor, drift, and microsaccades. Microsaccades are high velocity components with small amplitudes within fixational eye movements. However, microsaccade shapes and statistical properties vary between individual observers. Here we show that microsaccades can be formally represented with two significant shapes which we identfied using the mathematical definition of singularities for the detection of the former in real data with the continuous wavelet transform. For character-ization and model selection, we carried out a principal component analysis, which identified a step shape with an overshoot as first and a bump which regulates the overshoot as second component. We conclude that microsaccades are singular events with an overshoot component which can be detected by the continuous wavelet transform.


Author(s):  
Duo Wang ◽  
Toshihisa Tanaka

Kernel principal component analysis (KPCA) is a kernelized version of principal component analysis (PCA). A kernel principal component is a superposition of kernel functions. Due to the number of kernel functions equals the number of samples, each component is not a sparse representation. Our purpose is to sparsify coefficients expressing in linear combination of kernel functions, two types of sparse kernel principal component are proposed in this paper. The method for solving sparse problem comprises two steps: (a) we start with the Pythagorean theorem and derive an explicit regression expression of KPCA and (b) two types of regularization $l_1$-norm or $l_{2,1}$-norm are added into the regression expression in order to obtain two different sparsity form, respectively. As the proposed objective function is different from elastic net-based sparse PCA (SPCA), the SPCA method cannot be directly applied to the proposed cost function. We show that the sparse representations are obtained in its iterative optimization by conducting an alternating direction method of multipliers. Experiments on toy examples and real data confirm the performance and effectiveness of the proposed method.


Crystals ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 581
Author(s):  
Dmitry Chernyshov ◽  
Iurii Dovgaliuk ◽  
Vadim Dyadkin ◽  
Wouter van Beek

We analyze the application of Principal Component Analysis (PCA) for untangling the main contributions to changing diffracted intensities upon variation of site occupancy and lattice dimensions induced by external stimuli. The information content of the PCA output consists of certain functions of Bragg angles (loadings) and their evolution characteristics that depend on external variables like pressure or temperature (scores). The physical meaning of the PCA output is to date not well understood. Therefore, in this paper, the intensity contributions are first derived analytically, then compared with the PCA components for model data; finally PCA is applied for the real data on isothermal gas uptake by nanoporous framework γ –Mg(BH 4 ) 2 . We show that, in close agreement with previous analysis of modulation diffraction, the variation of intensity of Bragg lines and the displacements of their positions results in a series of PCA components. Every PCA extracted component may be a mixture of terms carrying information on the average structure, active sub-structure, and their cross-term. The rotational ambiguities, that are an inherently part of PCA extraction, are at the origin of the mixing. For the experimental case considered in the paper, the extraction of the physically meaningful loadings and scores can only be achieved with a rotational correction. Finally, practical recommendations for non-blind applications, i.e., what boundary conditions to apply for the the rotational correction, of PCA for diffraction data are given.


2021 ◽  
Author(s):  
Dashan Huang ◽  
Fuwei Jiang ◽  
Kunpeng Li ◽  
Guoshi Tong ◽  
Guofu Zhou

This paper proposes a novel supervised learning technique for forecasting: scaled principal component analysis (sPCA). The sPCA improves the traditional principal component analysis (PCA) by scaling each predictor with its predictive slope on the target to be forecasted. Unlike the PCA that maximizes the common variation of the predictors, the sPCA assigns more weight to those predictors with stronger forecasting power. In a general factor framework, we show that, under some appropriate conditions on data, the sPCA forecast beats the PCA forecast, and when these conditions break down, extensive simulations indicate that the sPCA still has a large chance to outperform the PCA. A real data example on macroeconomic forecasting shows that the sPCA has better performance in general.


Author(s):  
Firas Shawkat Hamid

Multivariate data analysis is one of the common techniques that are used in the analysis of the main compounds that perform the process of converting a large number of related variables into a smaller number of unrelated compounds, In the case of the emergence of anomalous values, which can be detected in many ways, the adoption of the matrix of contrast and common contrast will lead to misleading results in the analysis of the principal compounds. Therefore, many of the phenomena that consist of a large group of variables that are difficult to deal with initially, and the process of interpreting these variables becomes a complex process, so reducing these variables to a lower setting is easier to deal with, and it is the aspiration of every researcher working in the field of main compounds analysis or factor analysis. Because of technological development and the ability to communicate by audio and video interaction at the same time, on this research, a multivariate data collection process was conducted, where an evaluation of the efficiency of e-learning was studied and analyzed by highlighting the process of analyzing real data using factor analysis by the Principal Component Analysis method. This is one of the techniques used to summarize and shorten the data and through the use of the SPSS: Statistical Packages for Social Sciences Program, Thus, it will be noted that the subject of the paper will flow into the concept of Data mining also, And then achieve it using genetic algorithms using the simulation program with its final version, which is MATLAB, also using the method of Multiple Linear Regression Procedure to find the arrangement of independent variables by calculating the weight of the independent variable. Total results were obtained for the eigenvalues of the stored correlation matrix or the rotating factor matrix, The study required conducting statistical analysis in the mentioned way and by reducing the number of variables without losing much information about the original variables and its aim is to simplify its understanding and reveal its structure and interpretation, The study required conducting statistical analysis in the mentioned way and by reducing the number of variables without losing much information about the original variables and its aim is to simplify its understanding and reveal its structure and interpretation. In addition to reaching a set of conclusions that were discussed in detail also the addition to the important recommendations.


Author(s):  
Fayed Alshammri ◽  
Jiazhu Pan

AbstractThis paper proposes an extension of principal component analysis to non-stationary multivariate time series data. A criterion for determining the number of final retained components is proposed. An advance correlation matrix is developed to evaluate dynamic relationships among the chosen components. The theoretical properties of the proposed method are given. Many simulation experiments show our approach performs well on both stationary and non-stationary data. Real data examples are also presented as illustrations. We develop four packages using the statistical software R that contain the needed functions to obtain and assess the results of the proposed method.


2012 ◽  
Vol 2012 ◽  
pp. 1-13 ◽  
Author(s):  
Shengkun Xie ◽  
Anna T. Lawniczak ◽  
Sridhar Krishnan ◽  
Pietro Lio

We introduce multiscale wavelet kernels to kernel principal component analysis (KPCA) to narrow down the search of parameters required in the calculation of a kernel matrix. This new methodology incorporates multiscale methods into KPCA for transforming multiscale data. In order to illustrate application of our proposed method and to investigate the robustness of the wavelet kernel in KPCA under different levels of the signal to noise ratio and different types of wavelet kernel, we study a set of two-class clustered simulation data. We show that WKPCA is an effective feature extraction method for transforming a variety of multidimensional clustered data into data with a higher level of linearity among the data attributes. That brings an improvement in the accuracy of simple linear classifiers. Based on the analysis of the simulation data sets, we observe that multiscale translation invariant wavelet kernels for KPCA has an enhanced performance in feature extraction. The application of the proposed method to real data is also addressed.


2018 ◽  
Vol 24 (1) ◽  
pp. 69-84
Author(s):  
Renato César dos Santos ◽  
Mauricio Galo ◽  
Vilma Mayumi Tachibana

Abstract: The classification is an important step in the extraction of geometric primitives from LiDAR data. Normally, it is applied for the identification of points sampled on geometric primitives of interest. In the literature there are several studies that have explored the use of eigenvalues to classify LiDAR points into different classes or structures, such as corner, edge, and plane. However, in some works the classes are defined considering an ideal geometry, which can be affected by the inadequate sampling and/or by the presence of noise when using real data. To overcome this limitation, in this paper is proposed the use of metrics based on eigenvalues and the k-means method to carry out the classification. So, the concept of principal component analysis is used to obtain the eigenvalues and the derived metrics, while the k-means is applied to cluster the roof points in two classes: edge and non-edge. To evaluate the proposed method four test areas with different levels of complexity were selected. From the qualitative and quantitative analyses, it could be concluded that the proposed classification procedure gave satisfactory results, resulting in completeness and correctness above 92% for the non-edge class, and between 61% to 98% for the edge class.


Sign in / Sign up

Export Citation Format

Share Document