scholarly journals Quantifying the estimation error of principal component vectors

2019 ◽  
Vol 9 (3) ◽  
pp. 657-675
Author(s):  
Raphael Hauser ◽  
Jüri Lember ◽  
Heinrich Matzinger ◽  
Raul Kangro

Abstract Principal component analysis (PCA) is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $\widehat{\varSigma }$ that approximates a population covariance $\varSigma$, and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance $\widehat{\varSigma }$ rather than the ground-truth $\varSigma$, it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The combination of recent results of Koltchinskii & Lounici (2017, Bernoulli, 23, 110–133) and Yu et al. (2015, Biometrika, 102, 315–323) yields such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size order.

2019 ◽  
Vol 11 (10) ◽  
pp. 1219 ◽  
Author(s):  
Lan Zhang ◽  
Hongjun Su ◽  
Jingwei Shen

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1499-1506 ◽  
Author(s):  
Yangwu Zhang ◽  
Guohe Li ◽  
Heng Zong

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Mohammad Amin Shayegan ◽  
Saeed Aghabozorgi ◽  
Ram Gopal Raj

Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248896
Author(s):  
Nico Migenda ◽  
Ralf Möller ◽  
Wolfram Schenck

“Principal Component Analysis” (PCA) is an established linear technique for dimensionality reduction. It performs an orthonormal transformation to replace possibly correlated variables with a smaller set of linearly independent variables, the so-called principal components, which capture a large portion of the data variance. The problem of finding the optimal number of principal components has been widely studied for offline PCA. However, when working with streaming data, the optimal number changes continuously. This requires to update both the principal components and the dimensionality in every timestep. While the continuous update of the principal components is widely studied, the available algorithms for dimensionality adjustment are limited to an increment of one in neural network-based and incremental PCA. Therefore, existing approaches cannot account for abrupt changes in the presented data. The contribution of this work is to enable in neural network-based PCA the continuous dimensionality adjustment by an arbitrary number without the necessity to learn all principal components. A novel algorithm is presented that utilizes several PCA characteristics to adaptivly update the optimal number of principal components for neural network-based PCA. A precise estimation of the required dimensionality reduces the computational effort while ensuring that the desired amount of variance is kept. The computational complexity of the proposed algorithm is investigated and it is benchmarked in an experimental study against other neural network-based and incremental PCA approaches where it produces highly competitive results.


Author(s):  
Guang-Ho Cha

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


2017 ◽  
Vol 921 (3) ◽  
pp. 24-29 ◽  
Author(s):  
S.I. Lesnykh ◽  
A.K. Cherkashin

The proposed procedure of integral mapping is based on calculation of evaluation functions on the integral indicators (II) taking into account the feature of the local geographical environment, when geosystems in the same states in the different environs have various estimates. Calculation of II is realized with application of a Principal Component Analysis for processing of the forest database, allowing to consider in II the weight of each indicator (attribute). The final value of II is equal to a difference of the first (condition of geosystem) and the second (condition of environmental background) principal components. The evaluation functions are calculated on this value for various problems of integral mapping. The environmental factors of variability is excluded from final value of II, therefore there is an opportunity to find the invariant evaluation function and to determine coefficients of this function. Concepts and functions of the theory of reliability for making the evaluation maps of the hazard of functioning and stability of geosystems are used.


2012 ◽  
Vol 195-196 ◽  
pp. 402-406
Author(s):  
Xue Qin Chen ◽  
Rui Ping Wang

Classify the electrocardiogram (ECG) into different pathophysiological categories is a complex pattern recognition task which has been tried in lots of methods. This paper will discuss a method of principal component analysis (PCA) in exacting the heartbeat features, and a new method of classification that is to calculate the error between the testing heartbeat and reconstructed heartbeat. Training and testing heartbeat is taken from the MIT-BIH Arrhythmia Database, in which 8 types of arrhythmia signals are selected in this paper. The true positive rate (TPR) is 83%.


Sign in / Sign up

Export Citation Format

Share Document