Selecting the number of components in principal component analysis using cross-validation approximations

Pengenalan wajah manusia (face recognition) merupakan salah satu bidang penelitian yang penting dan belakangan ini banyak aplikasi yang menerapkannya, baik di bidang komersil ataupun di bidang penegakan hukum. Pengenalan wajah merupakan sebuah sistem yang berfungsikan untuk mengidentifikasi berdasarkan ciri-ciri dari wajah seseorang berbasis biometrik yang memiliki keakuratan tinggi. Pengenalan wajah dapat diterapkan pada sistem keamanan. Banyak metode yang dapat digunakan dalam aplikasi pengenalan wajah untuk keamanan sistem, namun pada artikel ini akan membahas tentang dua metode yaitu Two Dimensial Principal Component Analysis dan Kernel Fisher Discriminant Analysis dengan metode klasifikasi menggunakan K-Nearest Neigbor. Kedua metode ini diuji menggunakan metode cross validation. Hasil dari penelitian terdahulu terbukti bahwa sistem pengenalan wajah metode Two Dimensial Principal Component Analysis dengan 5-folds cross validation menghasilkan akurasi sebesar 88,73%, sedangkan dengan 2-folds validation akurasi yang dihasilkan sebesar 89,25%. Dan pengujian metode Kernel Fisher Discriminant dengan 2-folds cross validation menghasilkan akurasi rata rata sebesar 83,10%.

Download Full-text

Cross-validation methods in principal component analysis: A comparison

Statistical Methods & Applications ◽

10.1007/bf02511446 ◽

2002 ◽

Vol 11 (1) ◽

pp. 71-82 ◽

Cited By ~ 69

Author(s):

Giancarlo Diana ◽

Chiara Tommasi

Keyword(s):

Principal Component Analysis ◽

Cross Validation ◽

Principal Component ◽

Component Analysis ◽

Validation Methods

Download Full-text

IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier

BMC Bioinformatics ◽

10.1186/s12859-021-04104-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Rong Zhu ◽

Yong Wang ◽

Jin-Xing Liu ◽

Ling-Yun Dai

Keyword(s):

Principal Component Analysis ◽

Random Forest ◽

Cross Validation ◽

Search Algorithm ◽

Principal Component ◽

Component Analysis ◽

Biological Data ◽

Learning Technology ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential. Results Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611. Conclusions We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods.

Download Full-text

Cross-validation methods in principal component analysis: A comparison

Statistical Methods & Applications ◽

10.1007/s102600200026 ◽

2002 ◽

Vol 11 (1) ◽

pp. 71-82

Author(s):

Giancarlo Diana ◽

Chiara Tommasi

Keyword(s):

Principal Component Analysis ◽

Cross Validation ◽

Principal Component ◽

Component Analysis ◽

Validation Methods

Download Full-text

Cross-Validatory Choice of the Number of Components From a Principal Component Analysis

Technometrics ◽

10.1080/00401706.1982.10487712 ◽

1982 ◽

Vol 24 (1) ◽

pp. 73-77 ◽

Cited By ~ 190

Author(s):

H. T. Eastment ◽

W. J. Krzanowski

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Number Of Components

Download Full-text

Cross-Validation in Principal Component Analysis

Biometrics ◽

10.2307/2531996 ◽

1987 ◽

Vol 43 (3) ◽

pp. 575 ◽

Cited By ~ 125

Author(s):

W. J. Krzanowski

Keyword(s):

Principal Component Analysis ◽

Cross Validation ◽

Principal Component ◽

Component Analysis

Download Full-text

MFPCA: Multiscale Functional Principal Component Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014320 ◽

2019 ◽

Vol 33 ◽

pp. 4320-4327 ◽

Cited By ~ 1

Author(s):

Zhenhua Lin ◽

Hongtu Zhu

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Principal Components ◽

Functional Data ◽

Principal Component ◽

Component Analysis ◽

Functional Principal Component Analysis ◽

Number Of Components ◽

Small Variance ◽

Functional Principal Component

We consider the problem of performing dimension reduction on heteroscedastic functional data where the variance is in different scales over entire domain. The aim of this paper is to propose a novel multiscale functional principal component analysis (MFPCA) approach to address such heteroscedastic issue. The key ideas of MFPCA are to partition the whole domain into several subdomains according to the scale of variance, and then to conduct the usual functional principal component analysis (FPCA) on each individual subdomain. Both theoretically and numerically, we show that MFPCA can capture features on areas of low variance without estimating high-order principal components, leading to overall improvement of performance on dimension reduction for heteroscedastic functional data. In contrast, traditional FPCA prioritizes optimizing performance on the subdomain of larger data variance and requires a practically prohibitive number of components to characterize data in the region bearing relatively small variance.

Download Full-text

A decision procedure for determining the number of components in principal component analysis

Journal of Statistical Planning and Inference ◽

10.1016/0378-3758(92)90107-4 ◽

1992 ◽

Vol 30 (1) ◽

pp. 63-71 ◽

Cited By ~ 8

Author(s):

Deng-Yuan Huang ◽

Sheng-Tsaing Tseng

Keyword(s):

Principal Component Analysis ◽

Decision Procedure ◽

Principal Component ◽

Component Analysis ◽

Number Of Components

Download Full-text

Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest

Jurnal Infortech ◽

10.31294/infortech.v2i1.8079 ◽

2020 ◽

Vol 2 (1) ◽

pp. 96-101

Author(s):

Ahmad Fauzi ◽

Riki Supriyadi ◽

Nurlaelatul Maulidah

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Random Forest ◽

Cross Validation ◽

Principal Component ◽

Component Analysis ◽

Data Set ◽

Fold Cross Validation

Abstrak - Skrining merupakan upaya deteksi dini untuk mengidentifikasi penyakit atau kelainan yang secara klinis belum jelas dengan menggunakan tes, pemeriksaan atau prosedur tertentu. Upaya ini dapat digunakan secara cepat untuk membedakan orang - orang yang kelihatannya sehat tetapi sesungguhnya menderita suatu kelainan.Tujuan utama penelitian ini adalah untuk meningkatkan peforma klasifikasi pada diagnosis kanker payudara dengan menerapkan seleksi fitur pada beberapa algoritme klasifikasi. Penelitian ini menggunakan database kanker payudara Breast Cancer Coimbra Data Set . Metode seleksi fitur berbasis pricipal component analysis akan dipasangkan dengan beberapa algoritme klasifikasi dan metode, seperti Logitboost,Bagging,dan Random Forest. Penelitian ini menggunakan 10 fold cross validation sebagai metode evaluasi. Hasil penelitian menunjukkan metode seleksi fitur berbasis pricipal component analysis mengalami peningkatan peforma klasifikasi secara signifikan setelah dipasangkan dengan seleksi fitur Random Forest dan logitboost, Random forest menunjukan peforma terbaik dengan akurasi 79.3103% dengan nilai AUC sebesar 0,843. Kata Kunci: Seleksi Fitur,PCA, Kanker Payudara,Skrining,Random Forest

Download Full-text

Identification of Rainfall Patterns on Hydrological Simulation Using Robust Principal Component Analysis

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i3.pp1162-1167 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1162 ◽

Cited By ~ 1

Author(s):

S.M. Shaharudin ◽

N. Ahmad ◽

N.H. Zainuddin ◽

N.S. Mohamed

Keyword(s):

Principal Component Analysis ◽

Simulated Data ◽

Principal Component ◽

Breakdown Point ◽

Component Analysis ◽

Data Matrix ◽

Robust Pca ◽

Data Set ◽

Number Of Components ◽

Rainfall Patterns

A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.

Download Full-text