scholarly journals The use of principal components and univariate charts to control multivariate processes

2008 ◽  
Vol 28 (1) ◽  
pp. 173-196 ◽  
Author(s):  
Marcela A. G. Machado ◽  
Antonio F. B. Costa

In this article, we evaluate the performance of the T² chart based on the principal components (PC X chart) and the simultaneous univariate control charts based on the original variables (SU charts) or based on the principal components (SUPC charts). The main reason to consider the PC chart lies on the dimensionality reduction. However, depending on the disturbance and on the way the original variables are related, the chart is very slow in signaling, except when all variables are negatively correlated and the principal component is wisely selected. Comparing the SU , the SUPC and the T² charts we conclude that the SU X charts (SUPC charts) have a better overall performance when the variables are positively (negatively) correlated. We also develop the expression to obtain the power of two S² charts designed for monitoring the covariance matrix. These joint S² charts are, in the majority of the cases, more efficient than the generalized variance chart.

2009 ◽  
Vol 29 (3) ◽  
pp. 547-562 ◽  
Author(s):  
Marcela A. G. Machado ◽  
Antonio F. B. Costa ◽  
Fernando A. E. Claro

The T² chart and the generalized variance |S| chart are the usual tools for monitoring the mean vector and the covariance matrix of multivariate processes. The main drawback of these charts is the difficulty to obtain and to interpret the values of their monitoring statistics. In this paper, we study control charts for monitoring bivariate processes that only requires the computation of sample means (the ZMAX chart) for monitoring the mean vector, sample variances (the VMAX chart) for monitoring the covariance matrix, or both sample means and sample variances (the MCMAX chart) in the case of the joint control of the mean vector and the covariance matrix.


2019 ◽  
Vol 11 (10) ◽  
pp. 1219 ◽  
Author(s):  
Lan Zhang ◽  
Hongjun Su ◽  
Jingwei Shen

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 4249-4249
Author(s):  
Mario-Antoine Dicato ◽  
Garry Mahon

Abstract The human genome has been estimated to contain tens of thousands of genes. Of these, the promoters have been experimentally verified for almost two thousand. We have examined the DNA sequences just up-stream of the transcription start site, a region which includes the TATA box. Genetic control sites, such as promoters, often have a characteristic consensus sequence, but the variation about a given consensus sequence has received little attention. Sequence variations may be related to functional differences amongst the control sites. Principal components analysis has been chosen because of its generality and the variety of phenomena which it reveals. Promoter sequences were considered because of the large number available and their importance in gene expression. The sequences of the 1977 promoters recognised by human RNA polymerase II were obtained from the Eukaryotic Promoter Database. Many of these promoters are of interest in oncology and the database includes sequences for growth factors (e.g. GM-CSF, interleukins), oncogenes and tumour viruses among others. Sub-sequences of 25 bases centred on position −13 relative to the transcription start site were extracted. Two bits were used to encode each base (a=11, c=00, g=10 and t=01) and the covariance matrix of the resulting 50 variables was determined. The eigenvalues and eigenvectors of the covariance matrix were calculated. All calculations were carried out by computer using MS-Excel and SYSTAT 11. The eigenvalues of the covariance matrix ranged from 0.571 down to 0.133. The eigenvectors were used to calculate principal components. Thus 50 more or less correlated variables were transformed into 50 uncorrelated variables with the same total variance. The sequences were sorted according to the principal components to reveal which features were associated with the most variation amongst the sequences. When the covariances among the coded sequences were calculated many associations were found, for example, a purine at position 15 was associated with a purine at position 16, and a purine at position 19 with a G or C at position 20. Although these correlations individually were not especially strong, together they were a notable feature of the set of sequences. The consensus sequence was observed to be agggg ggggg ggc(g/c)c ggggg gcgcc. A principal components analysis enabled the promoters to be identified which differed most (in opposite directions) from the consensus sequence, taking account of the correlations. Nearly all the elements of the first eigenvector were of alternating sign; thus the first principal component separated promoters which were rich in G from those rich in T. Almost all elements of the second eigenvector were positive, so the second principal component distinguished promoters rich in A from those rich in C. There was a remarkable concentration of promoters from genes for interleukins or IL repressors with large values for the second principal component:- IL1A, IL2, IL4, IL6-2, IL2RA1, IL2RA2 and IL8RB were in positions 160, 43, 14, 158, 131, 101 and 158 (out of 1977) respectively. The variation in the sequence of promoters about their consensus sequence is seen not to be random but to display detectable patterns. Correlations were found to be frequent within the promoter sequences considered here; in the absence of correlations all the eigenvalues would have been equal. The major principal components separated promoters with markedly different sequences. It is to be expected that the other principal components would yield further separations.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1499-1506 ◽  
Author(s):  
Yangwu Zhang ◽  
Guohe Li ◽  
Heng Zong

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.


2012 ◽  
Vol 51 (No. 6) ◽  
pp. 244-255 ◽  
Author(s):  
M. Křepela ◽  
D. Zahradník ◽  
J. Sequens

The paper shows a possibility of using Bookstein coordinates for stem shape studies. Bookstein coordinates are simplified to stem shape diameters, for which tests of multidimensional normality, variance-covariance matrix homogeneity, equality of mean shape vectors and principal component calculation are carried out in sample plots Doubravčice 1 and Štíhlice. Principal components are also calculated for Procrustes tangent coordinates, presented in graphs, and the plots are compared. Doubravčice 1 and Štíhlice plots differ especially in age (70 and 30 years) while they do not differ in tree class representation.


2016 ◽  
Vol 9 (1) ◽  
pp. 10 ◽  
Author(s):  
Marjaana Lindeman ◽  
Jari Lipsanen

Mentalizing (i.e., theory of mind) is a much studied construct, but the way different forms of mentalizing are related to each other is poorly understood. In this study (N = 369), we examined the dimensionality of mentalizing by addressing several forms of cognitive and affective empathy, practical mentalizing skills (i.e., understanding figurative language and social etiquette), and metacognition. The results of principal component analysis showed that sixteen mentalizing variables could be reduced to four principal components, namely affective empathy, social skills, self-insight, and views about the nature of beliefs. The components were unrelated, suggesting that they are independent aspects of mentalizing. No general mentalizing factor or overall empathy was found indicating that mentalizing is a non-hierarchical profile construct.


2019 ◽  
Vol 9 (3) ◽  
pp. 657-675
Author(s):  
Raphael Hauser ◽  
Jüri Lember ◽  
Heinrich Matzinger ◽  
Raul Kangro

Abstract Principal component analysis (PCA) is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $\widehat{\varSigma }$ that approximates a population covariance $\varSigma$, and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance $\widehat{\varSigma }$ rather than the ground-truth $\varSigma$, it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The combination of recent results of Koltchinskii & Lounici (2017, Bernoulli, 23, 110–133) and Yu et al. (2015, Biometrika, 102, 315–323) yields such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size order.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248896
Author(s):  
Nico Migenda ◽  
Ralf Möller ◽  
Wolfram Schenck

“Principal Component Analysis” (PCA) is an established linear technique for dimensionality reduction. It performs an orthonormal transformation to replace possibly correlated variables with a smaller set of linearly independent variables, the so-called principal components, which capture a large portion of the data variance. The problem of finding the optimal number of principal components has been widely studied for offline PCA. However, when working with streaming data, the optimal number changes continuously. This requires to update both the principal components and the dimensionality in every timestep. While the continuous update of the principal components is widely studied, the available algorithms for dimensionality adjustment are limited to an increment of one in neural network-based and incremental PCA. Therefore, existing approaches cannot account for abrupt changes in the presented data. The contribution of this work is to enable in neural network-based PCA the continuous dimensionality adjustment by an arbitrary number without the necessity to learn all principal components. A novel algorithm is presented that utilizes several PCA characteristics to adaptivly update the optimal number of principal components for neural network-based PCA. A precise estimation of the required dimensionality reduces the computational effort while ensuring that the desired amount of variance is kept. The computational complexity of the proposed algorithm is investigated and it is benchmarked in an experimental study against other neural network-based and incremental PCA approaches where it produces highly competitive results.


Author(s):  
Guang-Ho Cha

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval


Author(s):  
L .N. Eeti ◽  
K. M. Buddhiraju ◽  
A. Bhattacharya

In remote sensing community, Principal Component Analysis (PCA) is widely utilized for dimensionality reduction in order to deal with high spectral-dimension data. However, dimensionality reduction through PCA results in loss of some spectral information. Analysis of an Earth-scene, based on first few principal component bands/channels, introduces error in classification, particularly since the dimensionality reduction in PCA does not consider accuracy of classification as a requirement. The present research work explores a different approach called Multi-Classifier System (MCS)/Ensemble classification to analyse high spectral-dimension satellite remote sensing data of WorldView-2 sensor. It examines the utility of MCS in landuse-landcover (LULC) classification without compromising any channel i.e. avoiding loss of information by utilizing all of the available spectral channels. It also presents a comparative study of classification results obtained by using only principal components by a single classifier and using all the original spectral channels in MCS. Comparative study of the classification results in the present work, demonstrates that utilizing all channels in MCS of five Artificial Neural Network classifiers outperforms a single Artificial Neural Network classifier that uses only first three principal components for classification process.


Sign in / Sign up

Export Citation Format

Share Document