scholarly journals Metode ROBPCA (Robust Principal Component Analysis) dan Clara (Clustering Large Area) pada Data dengan Outlier

2020 ◽  
Vol 13 (2) ◽  
pp. 11
Author(s):  
Bekti Endar Susilowati ◽  
Pardomuan Robinson Sihombing

Principal Component Analysis (PCA) merupakan salah satu analisis multivariat yang digunakan untuk mengganti variable dengan Principal Component yang sedikit jumlahnya namun tidak terlalu banyak informasi yang hilang. Atau dengan kata lain, it used to explain the underlying variance-covariance structure of the large data set of variables through a few linear combination of these variables. PCA sangat dipengaruhi oleh kehadiran outlier karena didasarkan pada matriks kovarian yang sensitive terhadap outlier. Oleh karena itu, pada analisis ini akan digunakan PCA yang robust terhadap outlier yaitu ROBPCA atau PCA Hubert. Selanjutnya, dari Principal Component yang terbentuk digunakan sebagai input (masukan) untuk cluster analysis dengan metode Clara (Clustering Large Area). Clustering Large Area merupakan salah satu metode k-medoids yang robust terhadap outlier dan baik digunakan pada data dalam jumlah besar. Dalam studi kasus terhadap variabel penyusun indeks kebahagiaan berdasarkan The World Happiness Report 2018 dengan metode Clara yang menggunakan jarak manhattan didapatkan nilai rata-rata Overall Average Silhouette Width yang terbaik pada 5 cluster. 

2016 ◽  
Vol 26 (2) ◽  
Author(s):  
Peter Filzmoser

In this paper we introduce a statistical method which can be used in combination with principal component analysis or factor analysis. Certain variables of a large data set which are of interest can be selected in order to calculate loadings and scores of these variables. We describe how the remaining variables of the data set can be presented in the previously extracted factor space. Furthermore, a possibility for the representation of the results is shown which is helpful for the interpretation.


2020 ◽  
Vol 12 (12) ◽  
pp. 1916 ◽  
Author(s):  
Christofer Schwartz ◽  
Lucas P. Ramos ◽  
Leonardo T. Duarte ◽  
Marcelo da S. Pinho ◽  
Mats I. Pettersson ◽  
...  

This paper addresses the use of a data analysis tool, known as robust principal component analysis (RPCA), in the context of change detection (CD) in ultrawideband (UWB) very high-frequency (VHF) synthetic aperture radar (SAR) images. The method considers image pairs of the same scene acquired at different time instants. The CD method aims to maximize the probability of detection (PD) and minimize the false alarm rate (FAR). Such aim fits into a multiobjective optimization problem, since maximizing the probability of detection generally implies an increase in the number of false alarms. In that sense, varying the RPCA regularization parameter leads to PD variation with respect to FAR, which is known as receiver operating characteristic (ROC) curve. To evaluate the proposed method, the CARABAS-II data set was considered. The experimental results show that RPCA via principal component pursuit (PCP) can provide a good trade-off between PD and FAR. A comparison between the results obtained with the proposed method and a classical CD algorithm based on the likelihood ratio test provides the pros and cons of the proposed method.


2017 ◽  
Vol 727 ◽  
pp. 447-449 ◽  
Author(s):  
Jun Dai ◽  
Hua Yan ◽  
Jian Jian Yang ◽  
Jun Jun Guo

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.


Energies ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 196 ◽  
Author(s):  
Lihui Zhang ◽  
Riletu Ge ◽  
Jianxue Chai

China’s energy consumption issues are closely associated with global climate issues, and the scale of energy consumption, peak energy consumption, and consumption investment are all the focus of national attention. In order to forecast the amount of energy consumption of China accurately, this article selected GDP, population, industrial structure and energy consumption structure, energy intensity, total imports and exports, fixed asset investment, energy efficiency, urbanization, the level of consumption, and fixed investment in the energy industry as a preliminary set of factors; Secondly, we corrected the traditional principal component analysis (PCA) algorithm from the perspective of eliminating “bad points” and then judged a “bad spot” sample based on signal reconstruction ideas. Based on the above content, we put forward a robust principal component analysis (RPCA) algorithm and chose the first five principal components as main factors affecting energy consumption, including: GDP, population, industrial structure and energy consumption structure, urbanization; Then, we applied the Tabu search (TS) algorithm to the least square to support vector machine (LSSVM) optimized by the particle swarm optimization (PSO) algorithm to forecast China’s energy consumption. We collected data from 1996 to 2010 as a training set and from 2010 to 2016 as the test set. For easy comparison, the sample data was input into the LSSVM algorithm and the PSO-LSSVM algorithm at the same time. We used statistical indicators including goodness of fit determination coefficient (R2), the root means square error (RMSE), and the mean radial error (MRE) to compare the training results of the three forecasting models, which demonstrated that the proposed TS-PSO-LSSVM forecasting model had higher prediction accuracy, generalization ability, and higher training speed. Finally, the TS-PSO-LSSVM forecasting model was applied to forecast the energy consumption of China from 2017 to 2030. According to predictions, we found that China shows a gradual increase in energy consumption trends from 2017 to 2030 and will breakthrough 6000 million tons in 2030. However, the growth rate is gradually tightening and China’s energy consumption economy will transfer to a state of diminishing returns around 2026, which guides China to put more emphasis on the field of energy investment.


2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Isabel Scherl ◽  
Benjamin Strom ◽  
Jessica K. Shang ◽  
Owen Williams ◽  
Brian L. Polagye ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document