Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

Download Full-text

Evaluation of the Aging Behavior of High Density Polyethylene in Thermal Oxidative Environment by Principal Component Analysis

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.727.447 ◽

2017 ◽

Vol 727 ◽

pp. 447-449 ◽

Cited By ~ 1

Author(s):

Jun Dai ◽

Hua Yan ◽

Jian Jian Yang ◽

Jun Jun Guo

Keyword(s):

Principal Component Analysis ◽

Impact Strength ◽

High Density Polyethylene ◽

Tensile Modulus ◽

Principal Component ◽

Component Analysis ◽

High Density ◽

Aging Behavior ◽

Data Set ◽

The Impact

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.

Download Full-text

Separation of the daily quiet variation from the geomagnetic field observations with the principal component analysis

10.5194/egusphere-egu2020-3423 ◽

2020 ◽

Author(s):

Anna Morozova ◽

Rania Rebbah ◽

M. Alexandra Pais

Keyword(s):

Principal Component Analysis ◽

Geomagnetic Field ◽

Daily Variation ◽

Extraction Procedure ◽

Principal Component ◽

Component Analysis ◽

Activity Level ◽

Data Series ◽

Data Sets ◽

Data Set

<p>Geomagnetic field (GMF) variations from external sources are classified as regular diurnal or occurring during periods of disturbances. The most significant regular variations are the quiet solar daily variation (Sq) and the disturbance daily variation (SD). These variations have well recognized daily cycles and need to be accounted for before the analysis of the disturbed field. Preliminary analysis of the GMF variations shows that the principal component analysis (PCA) is a useful tool for extraction of regular variations of GMF; however the requirements to the data set length, geomagnetic activity level etc. need to be established.</p><p>Here we present preliminary results of the PCA-based Sq extraction procedure based on the analysis of the Coimbra Geomagnetic Observatory (COI) measurements of the geomagnetic field components H, X, Y and Z between 2007 and 2015. The PCA-based Sq curves are compared with the standard ones obtained using 5 IQD per month. PCA was applied to data sets of different length: either 1 month-long data set for one of 2007-2015 years or data series for the same month but from different years (2007-2015) combined together. For most of the analyzed years the first PCA mode (PC1) was identified as SD variation and the second mode (PC2) was identified as Sq variation.</p>

Download Full-text

Combining multiway principal component analysis (MPCA) and clustering for efficient data mining of historical data sets of SBR processes

Water Science & Technology ◽

10.2166/wst.2008.143 ◽

2008 ◽

Vol 57 (10) ◽

pp. 1659-1666 ◽

Cited By ~ 30

Author(s):

Kris Villez ◽

Magda Ruiz ◽

Gürkan Sin ◽

Joan Colomer ◽

Christian Rosén ◽

...

Keyword(s):

Principal Component Analysis ◽

Clustering Algorithm ◽

Process Analysis ◽

Principal Component ◽

Component Analysis ◽

Data Sets ◽

Process Data ◽

Nitrogen And Phosphorus ◽

Online Data ◽

Data Set

A methodology based on Principal Component Analysis (PCA) and clustering is evaluated for process monitoring and process analysis of a pilot-scale SBR removing nitrogen and phosphorus. The first step of this method is to build a multi-way PCA (MPCA) model using the historical process data. In the second step, the principal scores and the Q-statistics resulting from the MPCA model are fed to the LAMDA clustering algorithm. This procedure is iterated twice. The first iteration provides an efficient and effective discrimination between normal and abnormal operational conditions. The second iteration of the procedure allowed a clear-cut discrimination of applied operational changes in the SBR history. Important to add is that this procedure helped identifying some changes in the process behaviour, which would not have been possible, had we only relied on visually inspecting this online data set of the SBR (which is traditionally the case in practice). Hence the PCA based clustering methodology is a promising tool to efficiently interpret and analyse the SBR process behaviour using large historical online data sets.

Download Full-text

Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0066 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Shofiqul Islam ◽

Sonia Anand ◽

Jemila Hamid ◽

Lehana Thabane ◽

Joseph Beyene

Keyword(s):

Principal Component Analysis ◽

Data Integration ◽

Principal Components ◽

Mirna Expression ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Sets ◽

Data Set ◽

Multiple Data Sets

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.

Download Full-text

THE IMPACT OF REAL ESTATE INVESTMENTS ON THE PERFORMANCE OF THE ENTITIES LISTED AT THE BVB

The Annals of the University of Oradea. Economic Sciences ◽

10.47535/1991auoes30(1)019 ◽

2021 ◽

Vol 30 (30 (1)) ◽

pp. 177-186

Author(s):

Silviu Cornel Virgil Chiriac

Keyword(s):

Principal Component Analysis ◽

Real Estate ◽

Financial Performance ◽

Descriptive Analysis ◽

Composite Index ◽

Principal Component ◽

Component Analysis ◽

Multidimensional Data ◽

Data Set ◽

The Impact

The current paper is part of a wider study which aims at identifying the determining factors of the performances of the entities in the real estate field and the setting up of a composite index of the companies’ performances based on a sample of 29 companies listed at the BVB Bucharest (Bucharest Stock Exchange) in the year 2019 using one of the multidimensional data analysis techniques, the principal component analysis. The descriptive analysis, the principal component analysis for setting up the composite index of the companies performances were applied within the study in order to highlight the most important companies from the point of view of the financial performance. The descriptive analysis of the data set highlights the overview within the companies selected for analysis. The study aims at building a synthetic indicator that will show the financial performance of the companies selected based on 9 financial indicators using the principal component analysis PCA. The 9 indicators considered for the analysis were selected based on specialised articles and they are: ROA – return on assets, which reflect the company’s capacity of using its assets productively, ROE – return on equity, which measures the efficiency of use of the stockholders’ capitals, rotation of total assets, general liquidity ratio, general solvency ratio, general dent-to-equity level, net profit margin, gross return of portfolio.

Download Full-text

Visualizing the Complexity of the Athlete-Monitoring Cycle Through Principal-Component Analysis

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2019-0045 ◽

2019 ◽

Vol 14 (9) ◽

pp. 1304-1310 ◽

Cited By ~ 2

Author(s):

Dan Weaving ◽

Clive Beggs ◽

Nicholas Dalton-Barron ◽

Ben Jones ◽

Grant Abt

Keyword(s):

Principal Component Analysis ◽

Multivariate Data ◽

Principal Component ◽

Component Analysis ◽

Data Matrix ◽

Data Sets ◽

Complex Data ◽

Great Loss ◽

Athlete Monitoring ◽

Multiple Variables

Purpose: To discuss the use of principal-component analysis (PCA) as a dimension-reduction and visualization tool to assist in decision making and communication when analyzing complex multivariate data sets associated with the training of athletes. Conclusions: Using PCA, it is possible to transform a data matrix into a set of orthogonal composite variables called principal components (PCs), with each PC being a linear weighted combination of the observed variables and with all PCs uncorrelated to each other. The benefit of transforming the data using PCA is that the first few PCs generally capture the majority of the information (ie, variance) contained in the observed data, with the first PC accounting for the highest amount of variance and each subsequent PC capturing less of the total information. Consequently, through PCA, it is possible to visualize complex data sets containing multiple variables on simple 2D scatterplots without any great loss of information, thereby making it much easier to convey complex information to coaches. In the future, athlete-monitoring companies should integrate PCA into their client packages to better support practitioners trying to overcome the challenges associated with multivariate data analysis and interpretation. In the interim, the authors present here an overview of PCA and associated R code to assist practitioners working in the field to integrate PCA into their athlete-monitoring process.

Download Full-text

Identification of Rainfall Patterns on Hydrological Simulation Using Robust Principal Component Analysis

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i3.pp1162-1167 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1162 ◽

Cited By ~ 1

Author(s):

S.M. Shaharudin ◽

N. Ahmad ◽

N.H. Zainuddin ◽

N.S. Mohamed

Keyword(s):

Principal Component Analysis ◽

Simulated Data ◽

Principal Component ◽

Breakdown Point ◽

Component Analysis ◽

Data Matrix ◽

Robust Pca ◽

Data Set ◽

Number Of Components ◽

Rainfall Patterns

A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.

Download Full-text

Locating Impact on Structural Plate Using Principal Component Analysis and Support Vector Machines

Mathematical Problems in Engineering ◽

10.1155/2013/352149 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 16

Author(s):

Heming Fu ◽

Qingsong Xu

Keyword(s):

Principal Component Analysis ◽

Support Vector Machines ◽

Principal Component ◽

Original Data ◽

Component Analysis ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Pca Algorithm ◽

The Impact

A new method which integrates principal component analysis (PCA) and support vector machines (SVM) is presented to predict the location of impact on a clamped aluminum plate structure. When the plate is knocked using an instrumented hammer, the induced time-varying strain signals are collected by four piezoelectric sensors which are mounted on the plate surface. The PCA algorithm is adopted for the dimension reduction of the large original data sets. Afterwards, a new two-layer SVM regression framework is proposed to improve the impact location accuracy. For a comparison study, the conventional backpropagation neural networks (BPNN) approach is implemented as well. Experimental results show that the proposed strategy achieves much better locating accuracy in comparison with the conventional approach.

Download Full-text

Enhancements to a Geographically Weighted Principal Component Analysis in the Context of an Application to an Environmental Data Set

Geographical Analysis ◽

10.1111/gean.12048 ◽

2014 ◽

Vol 47 (2) ◽

pp. 146-172 ◽

Cited By ~ 25

Author(s):

Paul Harris ◽

Annemarie Clarke ◽

Steve Juggins ◽

Chris Brunsdon ◽

Martin Charlton

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Environmental Data ◽

Data Set ◽

Weighted Principal Component

Download Full-text

Identification of Turbomachinery Noise Sources via Processing Beamforming Data Using Principal Component Analysis

Periodica Polytechnica Mechanical Engineering ◽

10.3311/ppme.18555 ◽

2021 ◽

Author(s):

Bence Fenyvesi ◽

Csaba Horváth

Keyword(s):

Principal Component Analysis ◽

Noise Source ◽

Principal Component ◽

Component Analysis ◽

Design Guidelines ◽

Noise Sources ◽

Data Set ◽

Wide Range ◽

Open Rotor ◽

The Impact

Complex turbomachinery systems produce a wide range of noise components. The goal is to identify noise source categories, determine their characteristic noise patterns and locations. Researchers can then use this information to quantify the impact of these noise sources, based on which new design guidelines can be proposed. Phased array microphone measurements processed with acoustic beamforming technology provide noise source maps for pre-determined frequency bands (i.e., bins) of the investigated spectrum. However, multiple noise generation mechanisms can be active in any given frequency bin. Therefore, the identification of individual noise sources is difficult and time consuming when using conventional methods, such as manual sorting. This study presents a method for combining beamforming with Principal Component Analysis (PCA) methods in order to identify and separate apart turbomachinery noise sources with strong harmonics. The method is presented through the investigation of Counter-Rotating Open Rotor (CROR) noise sources. It has been found that the proposed semi-automatic method was able to extract even weak noise source patterns that repeat throughout the data set of the beamforming maps. The analysis yields results that are easy to comprehend without special prior knowledge and is an effective tool for identifying and localizing noise sources for the acoustic investigation of various turbomachinery applications.

Download Full-text