A Study of Effectiveness of Principal Component Analysis on Different Data Sets

Principal Component Analysis of Hydrological Data

Handbook of Research on Hydroinformatics ◽

10.4018/978-1-61520-907-1.ch018 ◽

2010 ◽

pp. 364-388

Author(s):

Petr Praus

Keyword(s):

Water Quality ◽

Principal Component Analysis ◽

Drinking Water ◽

Ground Water ◽

Principal Component ◽

Component Analysis ◽

Data Sets ◽

Hydrological Data ◽

First Case

In this chapter the principals and applications of principal component analysis (PCA) applied on hydrological data are presented. Four case studies showed the possibility of PCA to obtain information about wastewater treatment process, drinking water quality in a city network and to find similarities in the data sets of ground water quality results and water-related images. In the first case study, the composition of raw and cleaned wastewater was characterised and its temporal changes were displayed. In the second case study, drinking water samples were divided into clusters in consistency with their sampling localities. In the case study III, the similar samples of ground water were recognised by the calculation of cosine similarity, the Euclidean and Manhattan distances. In the case study IV, 32 water-related images were transformed into a large image matrix whose dimensionality was reduced by PCA. The images were clustered using the PCA scatter plots.

Download Full-text

EVALUATION AND MODELLING OF GROUND WATER QUALITY DATA OF ALLAHABAD CITY BY ENVIRONMETRIC METHODS

Green Chemistry & Technology Letters ◽

10.18510/gctl.2016.248 ◽

2016 ◽

Vol 2 (4) ◽

pp. 211

Author(s):

Girdhari Lal Chaurasia ◽

Mahesh Kumar Gupta ◽

Praveen Kumar Tandon

Keyword(s):

Water Quality ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Factor Analysis ◽

Principal Component ◽

Component Analysis ◽

Sampling Location ◽

Data Sets ◽

Multivariate Statistical ◽

Positive Loading

Water is an essential resource for all the organisms, plants and animals including the human beings. It is the backbone for agricultural and industrial sectors and all the small business units. Increase in human population and economic activities have tremendously increased the demand for large-scale suppliers of fresh water for various competing end users.The quality evaluation of water is represented in terms of physical, chemical and Biological parameters. A particular problem in the case of water quality monitoring is the complexity associated with analyzing the large number of measured variables. The data sets contain rich information about the behavior of the water resources. Multivariate statistical approaches allow deriving hidden information from the data sets about the possible influences of the environment on water quality. Classification, modeling and interpretation of monitored data are the most important steps in the assessment of water quality. The application of different multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA) and factor analysis (FA) help to identify important components or factors accounting for most of the variances of a system. In the present study water samples were analyzed for various physicochemical analyses by different methods following the standards of APHA, BIS and WHO and were subjected to further statistical analysis viz. the cluster analysis to understand the similarity and differences among the various sampling stations. Three clusters were found. Cluster 1 was marked with 3 sampling locations 1, 3 & 5; Cluster-2 was marked with sampling location-2 and cluster-3 was marked with sampling location-4. Principal component analysis/factor analysis is a pattern reorganization technique which is used to assess the correlation between the observations in terms of different factors which are not observable. Observations correlated either positively or negatively, are likely to be affected by the same factors while the observations which are not correlated are influenced by different factors. In our study three factors explained 99.827% of variances. F1 marked 51.619% of total variances, high positive strong loading with TSS, TS, Temp, TDS, phosphate and moderate with electrical conductivity with loading values of 0.986, 0.970, 0.792, 0.744, 0.695, 0.701, respectively. Factor 2 marked 27.236% of the total variance with moderate positive loading with total alkalinity & temp. with loading values 0.723 & 0.606 respectively. It also explained the moderate negative loading with conductivity, TDS, and chloride with loading values -0.698, -0.690, -0.582. Factor F 3 marked 20.972 % of the variances with positive loading with PH, chloride, and phosphate with strong loading of pH 0.872 and moderate positive loading with chloride and phosphate with loading values 0.721, and 0.569 respectively.

Download Full-text

Identification of Excitation Source Number Using Principal Component Analysis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.199-200.850 ◽

2011 ◽

Vol 199-200 ◽

pp. 850-857

Author(s):

Jian Chao Dong ◽

Tie Jun Yang ◽

Xin Hui Li ◽

Zhi Jun Shuai ◽

You Hong Xiao

Keyword(s):

Principal Component Analysis ◽

Signal To Noise Ratio ◽

Multiple Input Multiple Output ◽

Principal Component ◽

Relevant Information ◽

Component Analysis ◽

Data Sets ◽

Blind Signal ◽

Input Multiple Output ◽

Source Number

Principal component analysis (PCA), serving as one of the basic blind signal processing techniques, is extensively employed in all forms of analysis for extracting relevant information from confusing data sets. The principle of PCA is explained in this paper firstly, then the simulation and experiment are carried out to a simply supported beam rig, and PCA is used in frequency domain to identify sources number of several cases. Meanwhile principal components (PCs) contribution coefficient and signal to noise ratio between neighboring PCs (neighboring SNR) are introduced to cutoff minor components quantificationally. The results show that when observation number is equal to or larger than source number and additive noise is feebleness, accurate prediction of the number of uncorrelated excitation sources in a multiple input multiple output system could be obtained by principal component analysis.

Download Full-text

Functional Connectivity: The Principal-Component Analysis of Large (PET) Data Sets

Journal of Cerebral Blood Flow & Metabolism ◽

10.1038/jcbfm.1993.4 ◽

1993 ◽

Vol 13 (1) ◽

pp. 5-14 ◽

Cited By ~ 1182

Author(s):

K. J. Friston ◽

C. D. Frith ◽

P. F. Liddle ◽

R. S. J. Frackowiak

Keyword(s):

Principal Component Analysis ◽

Functional Connectivity ◽

Verbal Fluency ◽

Principal Component ◽

Temporal Correlation ◽

Large Data ◽

Component Analysis ◽

Anterior Cingulate ◽

Data Sets ◽

Brain System

The distributed brain systems associated with performance of a verbal fluency task were identified in a nondirected correlational analysis of neurophysiological data obtained with positron tomography. This analysis used a recursive principal-component analysis developed specifically for large data sets. This analysis is interpreted in terms of functional connectivity, defined as the temporal correlation of a neurophysiological index measured in different brain areas. The results suggest that the variance in neurophysiological measurements, introduced experimentally, was accounted for by two independent principal components. The first, and considerably larger, highlighted an intentional brain system seen in previous studies of verbal fluency. The second identified a distributed brain system including the anterior cingulate and Wernicke's area that reflected monotonic time effects. We propose that this system has an attentional bias.

Download Full-text

Raman Spectroscopy Combined with Principal Component Analysis for Screening Nasopharyngeal Cancer in Human Blood Sera

Applied Spectroscopy ◽

10.1177/0003702817723928 ◽

2017 ◽

Vol 71 (11) ◽

pp. 2497-2503 ◽

Cited By ~ 11

Author(s):

Saranjam Khan ◽

Rahat Ullah ◽

Samina Javaid ◽

Shaheen Shahzad ◽

Hina Ali ◽

...

Keyword(s):

Principal Component Analysis ◽

Raman Spectroscopy ◽

Human Blood ◽

Nasopharyngeal Cancer ◽

Principal Component ◽

Component Analysis ◽

Data Sets ◽

Analysis Technique ◽

Raman Spectral Data ◽

Raman Spectral

This study demonstrates the analysis of nasopharyngeal cancer (NPC) in human blood sera using Raman spectroscopy combined with the multivariate analysis technique. Blood samples of confirmed NPC patients and healthy individuals have been used in this study. The Raman spectra from all these samples were recorded using 785 nm laser for excitation. Important Raman bands at 760, 800, 815, 834, 855, 1003, 1220–1275, and 1524 cm−1, have been observed in both normal and NPC samples. A decrease in the lipids content, phenylalanine, and β-carotene, whereas increases in amide III, tyrosine, and tryptophan have been observed in the NPC samples. The two data sets were well separated using principal component analysis (PCA) based on Raman spectral data. The spectral variations between the healthy and cancerous samples have been further highlighted by plotting loading vectors PC1 and PC2, which shows only those spectral regions where the differences are obvious.

Download Full-text

Analysis of 46,046 SARS-CoV-2 whole-genomes leveraging principal component analysis (PCA)

10.1101/2020.12.20.423682 ◽

2020 ◽

Author(s):

Christiane Scherer ◽

James Grover ◽

Darby Kammeraad ◽

Gabe Rudy ◽

Andreas Scherer

Keyword(s):

Principal Component Analysis ◽

Population Genetics ◽

Phylogenetic Analysis ◽

Principal Component ◽

Large Data ◽

Component Analysis ◽

Comprehensive Analysis ◽

Large Data Sets ◽

Data Sets ◽

Whole Genomes

AbstractSince the beginning of the global SARS-CoV-2 pandemic, there have been a number of efforts to understand the mutations and clusters of genetic lines of the SARS-CoV-2 virus. Until now, phylogenetic analysis methods have been used for this purpose. Here we show that Principal Component Analysis (PCA), which is widely used in population genetics, can not only help us to understand existing findings about the mutation processes of the virus, but can also provide even deeper insights into these processes while being less sensitive to sequencing gaps. Here we describe a comprehensive analysis of a 46,046 SARS-CoV-2 genome sequence dataset downloaded from the GISAID database in June of this year.SummaryPCA provides deep insights into the analysis of large data sets of SARS-CoV-2 genomes, revealing virus lineages that have thus far been unnoticed.

Download Full-text

Robust sparse principal component analysis by DC programming algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-191617 ◽

2020 ◽

Vol 39 (3) ◽

pp. 3183-3193

Author(s):

Jieya Li ◽

Liming Yang

Keyword(s):

Principal Component Analysis ◽

Continuous Optimization ◽

Principal Component ◽

Optimization Method ◽

Component Analysis ◽

Programming Algorithm ◽

Data Sets ◽

Sparse Principal Component Analysis ◽

Data Types ◽

L2 Norm

The classical principal component analysis (PCA) is not sparse enough since it is based on the L2-norm that is also prone to be adversely affected by the presence of outliers and noises. In order to address the problem, a sparse robust PCA framework is proposed based on the min of zero-norm regularization and the max of Lp-norm (0 < p ≤ 2) PCA. Furthermore, we developed a continuous optimization method, DC (difference of convex functions) programming algorithm (DCA), to solve the proposed problem. The resulting algorithm (called DC-LpZSPCA) is convergent linearly. In addition, when choosing different p values, the model can keep robust and is applicable to different data types. Numerical simulations are simulated in artificial data sets and Yale face data sets. Experiment results show that the proposed method can maintain good sparsity and anti-outlier ability.

Download Full-text

Principal component analysis and singular spectrum analysis of ULF geomagnetic data associated with earthquakes

Natural Hazards and Earth System Science ◽

10.5194/nhess-5-685-2005 ◽

2005 ◽

Vol 5 (5) ◽

pp. 685-689 ◽

Cited By ~ 30

Author(s):

A. Serita ◽

K. Hattori ◽

C. Yoshino ◽

M. Hayakawa ◽

N. Isezaki

Keyword(s):

Principal Component Analysis ◽

Spectral Analysis ◽

Principal Component ◽

Component Analysis ◽

Series Data ◽

Artificial Noise ◽

Data Sets ◽

Geomagnetic Variation ◽

Singular Spectral Analysis ◽

Intense Signal

Abstract. In order to extract any ULF signature associated with earthquakes, the principal component analysis (PCA) and singular spectral analysis (SSA) have been performed to investigate the possibility of discrimination of signals from different sources (geomagnetic variation, artificial noise, and the other sources (earthquake-related ULF emissions)). We adopt PCA to the time series data observed at closely separated stations, Seikoshi (SKS), Mochikoshi (MCK), and Kamo (KAM). In order to remove the most intense signal like the first principal component, we make the differential data sets of filtered 0.01Hz SKS-KAM and MCK-KAM in NS component and 0.01 Hz band. The major findings are as follows. (1) It is important to apply principal component analysis and singular spectral analysis simultaneously. SSA gives the structure of signals and the number of sensors for PCA is estimated. This makes the results convincing. (2) There is a significant advantage using PCA with differential data sets of filtered (0.01 Hz band) SKS-KAM and MCK-KAM in NS component for removing the most intense signal like global variation (solar-terrestrial interaction). This provides that the anomalous changes in the second principal component appeared more sharply. And the contribution of the second principal component is 20–40%. It is large enough to prove mathematical accuracy of the signal. Further application is required to accumulate events. These facts demonstrate the possibility of monitoring the crustal activity by using the PCA and SSA.

Download Full-text

CONDITIONS FOR THE APPLICABILITY OF THE PRINCIPAL COMPONENT ANALYSIS TO THE CHARACTERIZATION OF THE 1/f-NOISE

Fluctuation and Noise Letters ◽

10.1142/s0219477506003100 ◽

2006 ◽

Vol 06 (01) ◽

pp. L17-L28 ◽

Cited By ~ 1

Author(s):

JOSÉ MANUEL LÓPEZ-ALONSO ◽

JAVIER ALDA

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Random Variable ◽

Component Analysis ◽

Optical Systems ◽

Data Sets ◽

Regular Time ◽

Pca Method ◽

Definition Of

Principal Component Analysis (PCA) has been applied to the characterization of the 1/f-noise. The application of the PCA to the 1/f noise requires the definition of a stochastic multidimensional variable. The components of this variable describe the temporal evolution of the phenomena sampled at regular time intervals. In this paper we analyze the conditions about the number of observations and the dimension of the multidimensional random variable necessary to use the PCA method in a sound manner. We have tested the obtained conditions for simulated and experimental data sets obtained from imaging optical systems. The results can be extended to other fields where this kind of noise is relevant.

Download Full-text

Separation of the daily quiet variation from the geomagnetic field observations with the principal component analysis

10.5194/egusphere-egu2020-3423 ◽

2020 ◽

Author(s):

Anna Morozova ◽

Rania Rebbah ◽

M. Alexandra Pais

Keyword(s):

Principal Component Analysis ◽

Geomagnetic Field ◽

Daily Variation ◽

Extraction Procedure ◽

Principal Component ◽

Component Analysis ◽

Activity Level ◽

Data Series ◽

Data Sets ◽

Data Set

<p>Geomagnetic field (GMF) variations from external sources are classified as regular diurnal or occurring during periods of disturbances. The most significant regular variations are the quiet solar daily variation (Sq) and the disturbance daily variation (SD). These variations have well recognized daily cycles and need to be accounted for before the analysis of the disturbed field. Preliminary analysis of the GMF variations shows that the principal component analysis (PCA) is a useful tool for extraction of regular variations of GMF; however the requirements to the data set length, geomagnetic activity level etc. need to be established.</p><p>Here we present preliminary results of the PCA-based Sq extraction procedure based on the analysis of the Coimbra Geomagnetic Observatory (COI) measurements of the geomagnetic field components H, X, Y and Z between 2007 and 2015. The PCA-based Sq curves are compared with the standard ones obtained using 5 IQD per month. PCA was applied to data sets of different length: either 1 month-long data set for one of 2007-2015 years or data series for the same month but from different years (2007-2015) combined together. For most of the analyzed years the first PCA mode (PC1) was identified as SD variation and the second mode (PC2) was identified as Sq variation.</p>

Download Full-text