scholarly journals Convex Formulations for Fair Principal Component Analysis

Author(s):  
Matt Olfat ◽  
Anil Aswani

Though there is a growing literature on fairness for supervised learning, incorporating fairness into unsupervised learning has been less well-studied. This paper studies fairness in the context of principal component analysis (PCA). We first define fairness for dimensionality reduction, and our definition can be interpreted as saying a reduction is fair if information about a protected class (e.g., race or gender) cannot be inferred from the dimensionality-reduced data points. Next, we develop convex optimization formulations that can improve the fairness (with respect to our definition) of PCA and kernel PCA. These formulations are semidefinite programs, and we demonstrate their effectiveness using several datasets. We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates.

2007 ◽  
Vol 04 (01) ◽  
pp. 15-26 ◽  
Author(s):  
XIUQING WANG ◽  
ZENG-GUANG HOU ◽  
LONG CHENG ◽  
MIN TAN ◽  
FEI ZHU

The ability of cognition and recognition for complex environment is very important for a real autonomous robot. A new scene analysis method using kernel principal component analysis (kernel-PCA) for mobile robot based on multi-sonar-ranger data fusion is put forward. The principle of classification by principal component analysis (PCA), kernel-PCA, and the BP neural network (NN) approach to extract the eigenvectors which have the largest k eigenvalues are introduced briefly. Next the details of PCA, kernel-PCA and the BP NN method applied in the corridor scene analysis and classification for the mobile robots based on sonar data are discussed and the experimental results of those methods are given. In addition, a corridor-scene-classifier based on BP NN is discussed. The experimental results using PCA, kernel-PCA and the methods based on BP neural networks (NNs) are compared and the robustness of those methods are also analyzed. Such conclusions are drawn: in corridor scene classification, the kernel-PCA method has advantage over the ordinary PCA, and the approaches based on BP NNs can also get satisfactory results. The robustness of kernel-PCA is better than that of the methods based on BP NNs.


2010 ◽  
Vol 3 (5) ◽  
Author(s):  
Mario Bettenbühl ◽  
Claudia Paladini ◽  
Konstantin Mergenthaler ◽  
Reinhold Kliegl ◽  
Ralf Engbert ◽  
...  

During visual fixation on a target, humans perform miniature (or fixational) eye movements consisting of three components, i.e., tremor, drift, and microsaccades. Microsaccades are high velocity components with small amplitudes within fixational eye movements. However, microsaccade shapes and statistical properties vary between individual observers. Here we show that microsaccades can be formally represented with two significant shapes which we identfied using the mathematical definition of singularities for the detection of the former in real data with the continuous wavelet transform. For character-ization and model selection, we carried out a principal component analysis, which identified a step shape with an overshoot as first and a bump which regulates the overshoot as second component. We conclude that microsaccades are singular events with an overshoot component which can be detected by the continuous wavelet transform.


2010 ◽  
pp. 171-193
Author(s):  
Sean Eom

This chapter describes the factor procedure. The first section of the chapter begins with the definition of factor analysis. This is the statistical techniques whose common objective is to represent a set of variables in terms of a smaller number of hypothetical variables (factor). ACA uses principal component analysis to group authors into several catagories with similar lines of research. We also present many different approaches of preparing datasets including manual data inputs, in-file statement, and permanent datasets. We discuss each of the key SAS statements including DATA, INPUT, CARDS, PROC, and RUN. In addition, we examine several options statements to specify the followings: method for extracting factors; number of factors, rotation method, and displaying output options.


2006 ◽  
Vol 06 (01) ◽  
pp. L17-L28 ◽  
Author(s):  
JOSÉ MANUEL LÓPEZ-ALONSO ◽  
JAVIER ALDA

Principal Component Analysis (PCA) has been applied to the characterization of the 1/f-noise. The application of the PCA to the 1/f noise requires the definition of a stochastic multidimensional variable. The components of this variable describe the temporal evolution of the phenomena sampled at regular time intervals. In this paper we analyze the conditions about the number of observations and the dimension of the multidimensional random variable necessary to use the PCA method in a sound manner. We have tested the obtained conditions for simulated and experimental data sets obtained from imaging optical systems. The results can be extended to other fields where this kind of noise is relevant.


2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Hamidullah Binol

Classification is one of the most challenging tasks of remotely sensed data processing, particularly for hyperspectral imaging (HSI). Dimension reduction is widely applied as a preprocessing step for classification; however the reduction of dimension using conventional methods may not always guarantee high classification rate. Principal component analysis (PCA) and its nonlinear version kernel PCA (KPCA) are known as traditional dimension reduction algorithms. In a previous work, a variant of KPCA, denoted as Adaptive KPCA (A-KPCA), is suggested to get robust unsupervised feature representation for HSI. The specified technique employs several KPCAs simultaneously to obtain better feature points from each applied KPCA which includes different candidate kernels. Nevertheless, A-KPCA neglects the influence of subkernels employing an unweighted combination. Furthermore, if there is at least one weak kernel in the set of kernels, the classification performance may be reduced significantly. To address these problems, in this paper we propose an Ensemble Learning (EL) based multiple kernel PCA (M-KPCA) strategy. M-KPCA constructs a weighted combination of kernels with high discriminative ability from a predetermined set of base kernels and then extracts features in an unsupervised fashion. The experiments on two different AVIRIS hyperspectral data sets show that the proposed algorithm can achieve a satisfactory feature extraction performance on real data.


1998 ◽  
Vol 25 (6) ◽  
pp. 1050-1058 ◽  
Author(s):  
T O Siew-Yan-Yu ◽  
J Rousselle ◽  
G Jacques ◽  
V.-T.-V. Nguyen

A definition of homogeneous regions in terms of precipitation regime is achieved by the use of principal component analysis (PCA). The method has been shown to be a reliable regionalization tool even though it was applied to a territory showing rather complex physiography and high precipitation variation. Results based on the application of the PCA to the interstation correlation matrix of precipitation have indicated four distinct homogeneous regions. These regional patterns can be explained by the orographic effect and by the circulation of air masses within the study region.Key words: homogeneous regions, rainfall, principal component analysis, orographic effect.


2018 ◽  
Author(s):  
Toni Bakhtiar

Kernel Principal Component Analysis (Kernel PCA) is a generalization of the ordinary PCA which allows mapping the original data into a high-dimensional feature space. The mapping is expected to address the issues of nonlinearity among variables and separation among classes in the original data space. The key problem in the use of kernel PCA is the parameter estimation used in kernel functions that so far has not had quite obvious guidance, where the parameter selection mainly depends on the objectivity of the research. This study exploited the use of Gaussian kernel function and focused on the ability of kernel PCA in visualizing the separation of the classified data. Assessments were undertaken based on misclassification obtained by Fisher Discriminant Linear Analysis of the first two principal components. This study results suggest for the visualization of kernel PCA by selecting the parameter in the interval between the closest and the furthest distances among the objects of original data is better than that of ordinary PCA.


2021 ◽  
Author(s):  
Hocine Bendjama ◽  
Salah BOUHOUCHE ◽  
Salim AOUABDI ◽  
Jürgen BAST

Abstract The monitoring of casting quality is very important to ensure the safe operation of casting processes. In this paper, in order to improve the accurate detection of casting defects, a combined method based on Principal Component Analysis (PCA) and Self-Organizing Map (SOM) is presented. The proposed method reduces the dimensionality of the original data by the projection of the data onto a smaller subspace through PCA. It uses Hotelling’s T2 and Q statistics as essential features for characterizing the process functionality. The SOM is used to improve the separation between casting defects. It computes the metric distances based similarity, using the T2 and Q (T2Q) statistics as input. A comparative study between conventional SOM, SOM with reduced data and SOM with selected features is examined. The proposed method is used to identify the running conditions of the low pressure lost foam casting process. The monitoring results indicate that the SOM based on T2Q as feature vectors remains important comparatively to conventional SOM and SOM based on reduced data.


2006 ◽  
Vol 38 (2) ◽  
pp. 299-319 ◽  
Author(s):  
Stephan Huckemann ◽  
Herbert Ziezold

Classical principal component analysis on manifolds, for example on Kendall's shape spaces, is carried out in the tangent space of a Euclidean mean equipped with a Euclidean metric. We propose a method of principal component analysis for Riemannian manifolds based on geodesics of the intrinsic metric, and provide a numerical implementation in the case of spheres. This method allows us, for example, to compare principal component geodesics of different data samples. In order to determine principal component geodesics, we show that in general, owing to curvature, the principal component geodesics do not pass through the intrinsic mean. As a consequence, means other than the intrinsic mean are considered, allowing for several choices of definition of geodesic variance. In conclusion we apply our method to the space of planar triangular shapes and compare our findings with those of standard Euclidean principal component analysis.


Sign in / Sign up

Export Citation Format

Share Document