scholarly journals Dimensionality Reduction with Sparse Locality for Principal Component Analysis

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Pei Heng Li ◽  
Taeho Lee ◽  
Hee Yong Youn

Various dimensionality reduction (DR) schemes have been developed for projecting high-dimensional data into low-dimensional representation. The existing schemes usually preserve either only the global structure or local structure of the original data, but not both. To resolve this issue, a scheme called sparse locality for principal component analysis (SLPCA) is proposed. In order to effectively consider the trade-off between the complexity and efficiency, a robust L2,p-norm-based principal component analysis (R2P-PCA) is introduced for global DR, while sparse representation-based locality preserving projection (SR-LPP) is used for local DR. Sparse representation is also employed to construct the weighted matrix of the samples. Being parameter-free, this allows the construction of an intrinsic graph more robust against the noise. In addition, simultaneous learning of projection matrix and sparse similarity matrix is possible. Experimental results demonstrate that the proposed scheme consistently outperforms the existing schemes in terms of clustering accuracy and data reconstruction error.

2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.


2020 ◽  
Author(s):  
Alberto García-González ◽  
Antonio Huerta ◽  
Sergio Zlotnik ◽  
Pedro Díez

Abstract Methodologies for multidimensionality reduction aim at discovering low-dimensional manifolds where data ranges. Principal Component Analysis (PCA) is very effective if data have linear structure. But fails in identifying a possible dimensionality reduction if data belong to a nonlinear low-dimensional manifold. For nonlinear dimensionality reduction, kernel Principal Component Analysis (kPCA) is appreciated because of its simplicity and ease implementation. The paper provides a concise review of PCA and kPCA main ideas, trying to collect in a single document aspects that are often dispersed. Moreover, a strategy to map back the reduced dimension into the original high dimensional space is also devised, based on the minimization of a discrepancy functional.


Author(s):  
Maryam Abedini ◽  
Horriyeh Haddad ◽  
Marzieh Faridi Masouleh ◽  
Asadollah Shahbahrami

This study proposes an image denoising algorithm based on sparse representation and Principal Component Analysis (PCA). The proposed algorithm includes the following steps. First, the noisy image is divided into overlapped [Formula: see text] blocks. Second, the discrete cosine transform is applied as a dictionary for the sparse representation of the vectors created by the overlapped blocks. To calculate the sparse vector, the orthogonal matching pursuit algorithm is used. Then, the dictionary is updated by means of the PCA algorithm to achieve the sparsest representation of vectors. Since the signal energy, unlike the noise energy, is concentrated on a small dataset by transforming into the PCA domain, the signal and noise can be well distinguished. The proposed algorithm was implemented in a MATLAB environment and its performance was evaluated on some standard grayscale images under different levels of standard deviations of white Gaussian noise by means of peak signal-to-noise ratio, structural similarity indexes, and visual effects. The experimental results demonstrate that the proposed denoising algorithm achieves significant improvement compared to dual-tree complex discrete wavelet transform and K-singular value decomposition image denoising methods. It also obtains competitive results with the block-matching and 3D filtering method, which is the current state-of-the-art for image denoising.


2022 ◽  
pp. 146808742110707
Author(s):  
Aran Mohammad ◽  
Reza Rezaei ◽  
Christopher Hayduk ◽  
Thaddaeus Delebinski ◽  
Saeid Shahpouri ◽  
...  

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.


Author(s):  
Ade Jamal ◽  
Annisa Handayani ◽  
Ali Akbar Septiandri ◽  
Endang Ripmiatin ◽  
Yunus Effendi

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


2014 ◽  
Vol 556-562 ◽  
pp. 4317-4320
Author(s):  
Qiang Zhang ◽  
Li Ping Liu ◽  
Chao Liu

As a zero-emission mode of transportation, an increasing number of Electric Vehicles (EV) have come into use in our daily lives. The EV charging station is an important component of the Smart Grid which is now facing the challenges of big data. This paper presents a data compression and reconstruction method based on the technique of Principal Component Analysis (PCA). The data reconstruction error Normalized Absolute Percent Error (NAPE) is taken into consideration to balance the compression ratio and data reconstruction quality. By using the simulated data, the effectiveness of data compression and reconstruction for EV charging stations are verified.


Sign in / Sign up

Export Citation Format

Share Document