scholarly journals Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings

2021 ◽  
pp. 104779
Author(s):  
Yugo Nakayama ◽  
Kazuyoshi Yata ◽  
Makoto Aoshima
Author(s):  
Xianrui Wang ◽  
Guoxin Zhao ◽  
Yu Liu ◽  
Shujie Yang ◽  
◽  
...  

To solve uncertainties in industrial processes, interval kernel principal component analysis (IKPCA) has been proposed based on symbolic data analysis. However, it is experimentally discovered that the performance of IKPCA is worse than that of other algorithms. To improve the IKPCA algorithm, interval ensemble kernel principal component analysis (IEKPCA) is proposed. By optimizing the width parameters of the Gaussian kernel function, IEKPCA yields better performances. Ensemble learning is incorporated in the IEKPCA algorithm to build submodels with different width parameters. However, the multiple submodels will yield a large number of results, which will complicate the algorithm. To simplify the algorithm, a Bayesian decision is used to convert the result into fault probability. The final result is obtained via a weighting strategy. To verify the method, IEKPCA is applied to the Tennessee Eastman (TE) process. The false alarm rate, fault detection rate, accuracy, and other indicators used in the IEKPCA are compared with those of other algorithms. The results show that the IEKPCA improves the accuracy of uncertain nonlinear process monitoring.


2003 ◽  
Vol 1 (2-3) ◽  
pp. 151-156 ◽  
Author(s):  
R. L Sapra ◽  
S. K. Lal

AbstractWe suggest a diversity-dependent strategy, based on Principle Component Analysis, for selecting distinct accessions/parents for breeding from a soybean germplasm collection comprising of 463 lines, characterized and evaluated for 10 qualitative and eight quantitative traits. A sample size of six accessions included all the three states, namely low, medium and high of the individual quantitative traits, while a sample of 16–19 accessions included all the 60–64 distinct states of qualitative as well as quantitative traits. Under certain assumptions, the paper also develops an expression for estimating the size of a target population for capturing maximum variability in a sample three accessions.


2020 ◽  
Vol 49 (3) ◽  
pp. 330001-330001
Author(s):  
王昕 Xin WANG ◽  
康哲铭 Zhe-ming KANG ◽  
刘龙 Long LIU ◽  
范贤光 Xian-guang FAN

2021 ◽  
Vol 12 (4) ◽  
pp. 255
Author(s):  
Shuna Jiang ◽  
Qi Li ◽  
Rui Gan ◽  
Weirong Chen

To solve the problem of water management subsystem fault diagnosis in a proton exchange membrane fuel cell (PEMFC) system, a novel approach based on learning vector quantization neural network (LVQNN) and kernel principal component analysis (KPCA) is proposed. In the proposed approach, the KPCA method is used for processing strongly coupled fault data with a high dimension to reduce the data dimension and to extract new low-dimensional fault feature data. The LVQNN method is used to carry out fault recognition using the fault feature data. The effectiveness of the proposed fault detection method is validated using the experimental data of the PEMFC power system. Results show that the proposed method can quickly and accurately diagnose the three health states: normal state, water flooding failure and membrane dry failure, and the recognition accuracy can reach 96.93%. Therefore, the method proposed in this paper is suitable for processing the fault data with a high dimension and abundant quantities, and provides a reference for the application of water management subsystem fault diagnosis of PEMFC.


2018 ◽  
Author(s):  
Toni Bakhtiar

Kernel Principal Component Analysis (Kernel PCA) is a generalization of the ordinary PCA which allows mapping the original data into a high-dimensional feature space. The mapping is expected to address the issues of nonlinearity among variables and separation among classes in the original data space. The key problem in the use of kernel PCA is the parameter estimation used in kernel functions that so far has not had quite obvious guidance, where the parameter selection mainly depends on the objectivity of the research. This study exploited the use of Gaussian kernel function and focused on the ability of kernel PCA in visualizing the separation of the classified data. Assessments were undertaken based on misclassification obtained by Fisher Discriminant Linear Analysis of the first two principal components. This study results suggest for the visualization of kernel PCA by selecting the parameter in the interval between the closest and the furthest distances among the objects of original data is better than that of ordinary PCA.


Author(s):  
Shazlyn Milleana Shaharudin ◽  
Norhaiza Ahmad ◽  
Siti Mariana Che Mat Nor

This paper presents a modified correlation in principal component analysis (PCA) for selection number of clusters in identifying rainfall patterns. The approach of a clustering as guided by PCA is extensively employed in data with high dimension especially in identifying the spatial distribution patterns of daily torrential rainfall. Typically, a common method of identifying rainfall patterns for climatological investigation employed T mode-based Pearson correlation matrix to extract the relative variance retained. However, the data of rainfall in Peninsular Malaysia involved skewed observations in the direction of higher values with pure tendencies of values that are positive. Therefore, using Pearson correlation which was basing on PCA on rainfall set of data has the potentioal to influence the partitions of cluster as well as producing exceptionally clusters that are eneven in a space with high dimension. For current research, to resolve the unbalanced clusters challenge regarding the patterns of rainfall caused by the skewed character of the data, a robust dimension reduction method in PCA was employed. Thus, it led to the introduction of a robust measure in PCA with Tukey’s biweight correlation to downweigh observations along with the optimal breakdown point to obtain PCA’s quantity of components. Outcomes of this study displayed a highly substantial progress for the robust PCA, contrasting with the PCA-based Pearson correlation in respects to the average amount of acquired clusters and indicated 70% variance cumulative percentage at the breakdown point of 0.4.


Sign in / Sign up

Export Citation Format

Share Document