Structural classification based correlation and its application to principal component analysis for high-dimension low-sample size data

AbstractWe suggest a diversity-dependent strategy, based on Principle Component Analysis, for selecting distinct accessions/parents for breeding from a soybean germplasm collection comprising of 463 lines, characterized and evaluated for 10 qualitative and eight quantitative traits. A sample size of six accessions included all the three states, namely low, medium and high of the individual quantitative traits, while a sample of 16–19 accessions included all the 60–64 distinct states of qualitative as well as quantitative traits. Under certain assumptions, the paper also develops an expression for estimating the size of a target population for capturing maximum variability in a sample three accessions.

Download Full-text

Whole Brain Atrophy and Sample Size Estimate via Iterative Principal Component Analysis for Twelve-month Alzheimer's Disease Trials

Neuroscience and Biomedical Engineering ◽

10.2174/2213385211301010007 ◽

2013 ◽

Vol 1 (1) ◽

pp. 40-47 ◽

Cited By ~ 3

Author(s):

Napatkamon Ayutyanont ◽

Kewei Chen ◽

Adam S. Fleisher ◽

Jessica B.S. Langbaum ◽

Cole Reschke ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Principal Component Analysis ◽

Sample Size ◽

Brain Atrophy ◽

Principal Component ◽

Component Analysis ◽

Whole Brain ◽

Size Estimate ◽

Sample Size Estimate

Download Full-text

Fault Diagnosis for PEMFC Water Management Subsystem Based on Learning Vector Quantization Neural Network and Kernel Principal Component Analysis

World Electric Vehicle Journal ◽

10.3390/wevj12040255 ◽

2021 ◽

Vol 12 (4) ◽

pp. 255

Author(s):

Shuna Jiang ◽

Qi Li ◽

Rui Gan ◽

Weirong Chen

Keyword(s):

Neural Network ◽

Water Management ◽

Principal Component Analysis ◽

Fault Diagnosis ◽

Vector Quantization ◽

High Dimension ◽

Principal Component ◽

Component Analysis ◽

Learning Vector Quantization ◽

Kernel Principal Component Analysis

To solve the problem of water management subsystem fault diagnosis in a proton exchange membrane fuel cell (PEMFC) system, a novel approach based on learning vector quantization neural network (LVQNN) and kernel principal component analysis (KPCA) is proposed. In the proposed approach, the KPCA method is used for processing strongly coupled fault data with a high dimension to reduce the data dimension and to extract new low-dimensional fault feature data. The LVQNN method is used to carry out fault recognition using the fault feature data. The effectiveness of the proposed fault detection method is validated using the experimental data of the PEMFC power system. Results show that the proposed method can quickly and accurately diagnose the three health states: normal state, water flooding failure and membrane dry failure, and the recognition accuracy can reach 96.93%. Therefore, the method proposed in this paper is suitable for processing the fault data with a high dimension and abundant quantities, and provides a reference for the application of water management subsystem fault diagnosis of PEMFC.

Download Full-text

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

International Statistical Review ◽

10.1111/insr.12220 ◽

2017 ◽

Vol 86 (1) ◽

pp. 29-50 ◽

Cited By ~ 20

Author(s):

Hervé Cardot ◽

David Degras

Keyword(s):

Principal Component Analysis ◽

High Dimension ◽

Principal Component ◽

Component Analysis

Download Full-text

A modified correlation in principal component analysis for torrential rainfall patterns identification

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i4.pp655-661 ◽

2020 ◽

Vol 9 (4) ◽

pp. 655

Author(s):

Shazlyn Milleana Shaharudin ◽

Norhaiza Ahmad ◽

Siti Mariana Che Mat Nor

Keyword(s):

Principal Component Analysis ◽

High Dimension ◽

Pearson Correlation ◽

Principal Component ◽

Breakdown Point ◽

Distribution Patterns ◽

Component Analysis ◽

Peninsular Malaysia ◽

Torrential Rainfall ◽

Rainfall Patterns

This paper presents a modified correlation in principal component analysis (PCA) for selection number of clusters in identifying rainfall patterns. The approach of a clustering as guided by PCA is extensively employed in data with high dimension especially in identifying the spatial distribution patterns of daily torrential rainfall. Typically, a common method of identifying rainfall patterns for climatological investigation employed T mode-based Pearson correlation matrix to extract the relative variance retained. However, the data of rainfall in Peninsular Malaysia involved skewed observations in the direction of higher values with pure tendencies of values that are positive. Therefore, using Pearson correlation which was basing on PCA on rainfall set of data has the potentioal to influence the partitions of cluster as well as producing exceptionally clusters that are eneven in a space with high dimension. For current research, to resolve the unbalanced clusters challenge regarding the patterns of rainfall caused by the skewed character of the data, a robust dimension reduction method in PCA was employed. Thus, it led to the introduction of a robust measure in PCA with Tukey’s biweight correlation to downweigh observations along with the optimal breakdown point to obtain PCA’s quantity of components. Outcomes of this study displayed a highly substantial progress for the robust PCA, contrasting with the PCA-based Pearson correlation in respects to the average amount of acquired clusters and indicated 70% variance cumulative percentage at the breakdown point of 0.4.

Download Full-text

Rice Odours’ Readings Investigation Using Principal Component Analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.29.13803 ◽

2018 ◽

Vol 7 (2.29) ◽

pp. 488

Author(s):

Nurul Aini Abdul Wahab ◽

Shamshuritawati Sharif

Keyword(s):

Principal Component Analysis ◽

High Dimension ◽

Electronic Nose ◽

Principal Component ◽

Component Analysis ◽

The Other ◽

Total Variance ◽

Scree Plot ◽

Cumulative Proportion ◽

Principle Component

The use of electronic nose (e-nose) devices plus principal component analysis can help the process of categorizing the 16 different rice into its type. Generally, the physical feature of an e-nose own more than one hole to capture the odour of rice. For example, the portable e-nose so-called Insniff does have 10 holes (or variables). In this situations, we will have a dataset that consist high-dimension dataset where lead to the presence of interdependencies between all variables under study. Therefore, this study is presented to investigate the odour of rice for identifying the most important variables contributing to the rice odour readings. The principal component analysis (PCA) is implemented to determine the component that best represent the all 10 variables in order to eliminate the interdependency problem, and (2) to identify which variable is considered as important and influential to the newly-formed principle component (PC). The results from PCA suggested that the first two principle components is chosen. It is based on three assessments which are Kaiser’s criterion larger than 1, cumulative proportion of total variance, and scree plot. These two principle components explained 89% of total variance. Results showed that sensor 1 (0.931) and sensor 2 (0.966) are the two important variables that highly contribute to PC1. On the other hand, for PC2, the highest contribution is from sensor 8 (0.828). This study demonstrate that PCA is effective for investigating rice odour readings.

Download Full-text

Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

Ekológia (Bratislava) ◽

10.1515/eko-2016-0014 ◽

2016 ◽

Vol 35 (2) ◽

pp. 173-190 ◽

Cited By ~ 13

Author(s):

S. Shahid Shaukat ◽

Toqeer Ahmed Rao ◽

Moazzam A. Khan

Keyword(s):

Principal Component Analysis ◽

Sample Size ◽

Principal Component ◽

Component Analysis ◽

Small Sample ◽

Environmental Data ◽

Data Matrix ◽

Data Sets ◽

Data Set ◽

The Impact

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

Download Full-text

Estimation of Knee Movement from Surface EMG Using Random Forest with Principal Component Analysis

Electronics ◽

10.3390/electronics9010043 ◽

2019 ◽

Vol 9 (1) ◽

pp. 43 ◽

Cited By ~ 5

Author(s):

Zhong Li ◽

Xiaorong Guan ◽

Kaifan Zou ◽

Cheng Xu

Keyword(s):

Principal Component Analysis ◽

Random Forest ◽

Sample Size ◽

Back Propagation ◽

Principal Component ◽

Component Analysis ◽

Surface Emg ◽

Joint Movement ◽

Estimation Model ◽

Knee Movement

To study the relationship between surface electromyography (sEMG) and joint movement, and to provide reliable reference information for the exoskeleton control, the sEMG and the corresponding movement of the knee during the normal walking of adults have been measured. After processing the experimental data, the estimation model for knee movement from sEMG was established using the novel method of random forest with principal component analysis (RFPCA). The influence of the sample size and the previous sEMG data on the prediction efficiency was analyzed. The estimation model was not sensitive to the sample size when samples increased to a certain value, and the results of different previous sEMG showed that the prediction accuracy of the estimation models did not always improve with the increasing features of input. By comparing the estimation model of back propagation neural network with principal component analysis (BPPCA), it was found that RFPCA was suitable for all participants in the experiment with less execution time, and the root mean square error was around 5° which was lower than BPPCA with errors varying from 7° to 25°. Therefore, it was concluded that the RFPCA method for the estimation of knee movement from sEMG is feasible and could be used for motion analysis and the control of exoskeleton.

Download Full-text