The Uncertainty and Robustness of the Principal Component Analysis as a Tool for the Dimensionality Reduction

Solid State Phenomena ◽

10.4028/www.scientific.net/ssp.235.1 ◽

2015 ◽

Vol 235 ◽

pp. 1-8

Author(s):

Jacek Pietraszek ◽

Ewa Skrzypczak-Pietraszek

Keyword(s):

Experimental Data ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Confidence Intervals ◽

Dimensional Space ◽

Experimental Studies ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Data Points

Experimental studies very often lead to datasets with a large number of noted attributes (observed properties) and relatively small number of records (observed objects). The classic analysis cannot explain recorded attributes in the form of regression relationships due to lack of sufficient number of data points. One of method making available a filtering of unimportant attributes is an approach known as ‘dimensionality reduction’. Well-known example of such approach is principal component analysis (PCA) which transforms the data from the high-dimensional space to a space of fewer dimensions and gives heuristics to select least but necessary number of dimensions. Authors used such technique successfully in their previous investigations but a question arose: whether PCA is robust and stable? This paper tries to answer this question by re-sampling experimental data and observing empirical confidence intervals of parameters used to make decision in PCA heuristics.

Download Full-text

Enhanced Supervised Principal Component Analysis for Cancer Classification

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.4.28 ◽

2021 ◽

pp. 1321-1333

Author(s):

Ghadeer JM Mahdi ◽

Bayda A. Kalaf ◽

Mundher A. Khaleel

Keyword(s):

Principal Component Analysis ◽

Gradient Descent ◽

Dimensional Space ◽

Principal Component ◽

Component Analysis ◽

Large Datasets ◽

Stochastic Gradient Descent ◽

High Dimensional ◽

Excellent Method ◽

Blue Cell

In this paper, a new hybridization of supervised principal component analysis (SPCA) and stochastic gradient descent techniques is proposed, and called as SGD-SPCA, for real large datasets that have a small number of samples in high dimensional space. SGD-SPCA is proposed to become an important tool that can be used to diagnose and treat cancer accurately. When we have large datasets that require many parameters, SGD-SPCA is an excellent method, and it can easily update the parameters when a new observation shows up. Two cancer datasets are used, the first is for Leukemia and the second is for small round blue cell tumors. Also, simulation datasets are used to compare principal component analysis (PCA), SPCA, and SGD-SPCA. The results show that SGD-SPCA is more efficient than other existing methods.

Download Full-text

Principal Component Analysis Considering Weights Based on Dissimilarity of Objects in High Dimensional Space

Intelligent Engineering Systems Through Artificial Neural Networks, Volume 17 ◽

10.1115/1.802655.paper45 ◽

2007 ◽

pp. 291-296

Keyword(s):

Principal Component Analysis ◽

Dimensional Space ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

High Dimensional Space

Download Full-text

Performance Analysis of Dimensionality Reduction Techniques in the Context of Clustering

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s3.2084 ◽

2019 ◽

Vol 8 (S3) ◽

pp. 66-71

Author(s):

T. Sudha ◽

P. Nagendra Kumar

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

High Dimensional Data ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques ◽

Low Dimensional ◽

Probabilistic Principal Component Analysis

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.

Download Full-text

A kernel Principal Component Analysis (kPCA) Digest with a New Backward Mapping (pre-image reconstruction) Strategy

10.21203/rs.3.rs-126052/v1 ◽

2020 ◽

Author(s):

Alberto García-González ◽

Antonio Huerta ◽

Sergio Zlotnik ◽

Pedro Díez

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Dimensional Space ◽

Principal Component ◽

Linear Structure ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Nonlinear Dimensionality Reduction ◽

Dimensional Manifold ◽

Low Dimensional

Abstract Methodologies for multidimensionality reduction aim at discovering low-dimensional manifolds where data ranges. Principal Component Analysis (PCA) is very effective if data have linear structure. But fails in identifying a possible dimensionality reduction if data belong to a nonlinear low-dimensional manifold. For nonlinear dimensionality reduction, kernel Principal Component Analysis (kPCA) is appreciated because of its simplicity and ease implementation. The paper provides a concise review of PCA and kPCA main ideas, trying to collect in a single document aspects that are often dispersed. Moreover, a strategy to map back the reduced dimension into the original high dimensional space is also devised, based on the minimization of a discrepancy functional.

Download Full-text

Anomaly Detection in Data with Extremely High Dimensional Space via Online Oversampling Principal Component Analysis

IOSR Journal of Computer Engineering ◽

10.9790/0661-16376773 ◽

2014 ◽

Vol 16 (3) ◽

pp. 67-73 ◽

Cited By ~ 2

Author(s):

Swapnil S. Raut ◽

◽

Sachin N. Deshmukh

Keyword(s):

Principal Component Analysis ◽

Anomaly Detection ◽

Dimensional Space ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

High Dimensional Space

Download Full-text

Analysis of bath motion in MM-SQC dynamics via dimensionality reduction approach: Principal component analysis

The Journal of Chemical Physics ◽

10.1063/5.0039743 ◽

2021 ◽

Vol 154 (9) ◽

pp. 094122

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Reduction Approach

Download Full-text

Low-temperature thermal hydrolysis of sludge prior to anaerobic digestion: principal component analysis (PCA) of experimental data

Data in Brief ◽

10.1016/j.dib.2021.107323 ◽

2021 ◽

pp. 107323

Author(s):

Mohamed N.A. Meshref ◽

Seyed Mohammad Mirsoleimani Azizi ◽

Wafa Dastyar ◽

Rasha Maal-Bared ◽

Bipro Ranjan Dhar

Keyword(s):

Experimental Data ◽

Principal Component Analysis ◽

Anaerobic Digestion ◽

Low Temperature ◽

Principal Component ◽

Component Analysis ◽

Thermal Hydrolysis ◽

Hydrolysis Of

Download Full-text

Characterization of fretting wear experiments on spline couplings by principal component analysis

Proceedings of the Institution of Mechanical Engineers Part J Journal of Engineering Tribology ◽

10.1177/1350650116682162 ◽

2016 ◽

Vol 231 (7) ◽

pp. 860-868 ◽

Cited By ~ 2

Author(s):

Waqar Qureshi ◽

Francesca Cura ◽

Andrea Mura

Keyword(s):

Experimental Data ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Fretting Wear ◽

Aircraft Industry ◽

Surface Movement ◽

Noisy Measurements ◽

Quasi Static Process

Fretting wear is a quasi-static process in which repeated relative surface movement of components results in wear and fatigue. Fretting wear is quite significant in the case of spline couplings which are frequently used in the aircraft industry to transfer torque and power. Fretting wear depends on materials, pressure distribution, torque, rotational speeds, lubrication, surface finish, misalignment between spline shafts, etc. The presence of so many factors makes it difficult to conduct experiments for better models of fretting wear and it is the case whenever a mathematical model is sought from experimental data which is prone to noisy measurements, outliers and redundant variables. This work develops a principal component analysis based method, using a criterion which is insensitive to outliers, to realize a better design and interpret experiments on fretting wear. The proposed method can be extended to other cases too.

Download Full-text

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Factor Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Download Full-text

Dimensionality Reduction with Principal Component Analysis

Mathematics for Machine Learning ◽

10.1017/9781108679930.012 ◽

2020 ◽

pp. 286-313

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis

Download Full-text