The Uncertainty and Robustness of the Principal Component Analysis as a Tool for the Dimensionality Reduction

2015 ◽  
Vol 235 ◽  
pp. 1-8
Author(s):  
Jacek Pietraszek ◽  
Ewa Skrzypczak-Pietraszek

Experimental studies very often lead to datasets with a large number of noted attributes (observed properties) and relatively small number of records (observed objects). The classic analysis cannot explain recorded attributes in the form of regression relationships due to lack of sufficient number of data points. One of method making available a filtering of unimportant attributes is an approach known as ‘dimensionality reduction’. Well-known example of such approach is principal component analysis (PCA) which transforms the data from the high-dimensional space to a space of fewer dimensions and gives heuristics to select least but necessary number of dimensions. Authors used such technique successfully in their previous investigations but a question arose: whether PCA is robust and stable? This paper tries to answer this question by re-sampling experimental data and observing empirical confidence intervals of parameters used to make decision in PCA heuristics.

2021 ◽  
pp. 1321-1333
Author(s):  
Ghadeer JM Mahdi ◽  
Bayda A. Kalaf ◽  
Mundher A. Khaleel

In this paper, a new hybridization of supervised principal component analysis (SPCA) and stochastic gradient descent techniques is proposed, and called as SGD-SPCA, for real large datasets that have a small number of samples in high dimensional space. SGD-SPCA is proposed to become an important tool that can be used to diagnose and treat cancer accurately. When we have large datasets that require many parameters, SGD-SPCA is an excellent method, and it can easily update the parameters when a new observation shows up. Two cancer datasets are used, the first is for Leukemia and the second is for small round blue cell tumors. Also, simulation datasets are used to compare principal component analysis (PCA), SPCA, and SGD-SPCA. The results show that SGD-SPCA is more efficient than other existing methods.


2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.


2020 ◽  
Author(s):  
Alberto García-González ◽  
Antonio Huerta ◽  
Sergio Zlotnik ◽  
Pedro Díez

Abstract Methodologies for multidimensionality reduction aim at discovering low-dimensional manifolds where data ranges. Principal Component Analysis (PCA) is very effective if data have linear structure. But fails in identifying a possible dimensionality reduction if data belong to a nonlinear low-dimensional manifold. For nonlinear dimensionality reduction, kernel Principal Component Analysis (kPCA) is appreciated because of its simplicity and ease implementation. The paper provides a concise review of PCA and kPCA main ideas, trying to collect in a single document aspects that are often dispersed. Moreover, a strategy to map back the reduced dimension into the original high dimensional space is also devised, based on the minimization of a discrepancy functional.


Data in Brief ◽  
2021 ◽  
pp. 107323
Author(s):  
Mohamed N.A. Meshref ◽  
Seyed Mohammad Mirsoleimani Azizi ◽  
Wafa Dastyar ◽  
Rasha Maal-Bared ◽  
Bipro Ranjan Dhar

Author(s):  
Waqar Qureshi ◽  
Francesca Cura ◽  
Andrea Mura

Fretting wear is a quasi-static process in which repeated relative surface movement of components results in wear and fatigue. Fretting wear is quite significant in the case of spline couplings which are frequently used in the aircraft industry to transfer torque and power. Fretting wear depends on materials, pressure distribution, torque, rotational speeds, lubrication, surface finish, misalignment between spline shafts, etc. The presence of so many factors makes it difficult to conduct experiments for better models of fretting wear and it is the case whenever a mathematical model is sought from experimental data which is prone to noisy measurements, outliers and redundant variables. This work develops a principal component analysis based method, using a criterion which is insensitive to outliers, to realize a better design and interpret experiments on fretting wear. The proposed method can be extended to other cases too.


2022 ◽  
pp. 146808742110707
Author(s):  
Aran Mohammad ◽  
Reza Rezaei ◽  
Christopher Hayduk ◽  
Thaddaeus Delebinski ◽  
Saeid Shahpouri ◽  
...  

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.


Sign in / Sign up

Export Citation Format

Share Document