scholarly journals Identification of Rainfall Patterns on Hydrological Simulation Using Robust Principal Component Analysis

Author(s):  
S.M. Shaharudin ◽  
N. Ahmad ◽  
N.H. Zainuddin ◽  
N.S. Mohamed

A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and  compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a  more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.

2020 ◽  
Vol 32 (10) ◽  
pp. 1901-1935
Author(s):  
Keishi Sando ◽  
Hideitsu Hino

Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and various robust PCA methods have been proposed. It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation, because mode estimation is not significantly affected by the presence of outliers. Thus, this study proposes a modal principal component analysis (MPCA), which is a robust PCA method based on mode estimation. The proposed method finds the minor component by estimating the mode of the projected data points. As a theoretical contribution, probabilistic convergence property, influence function, finite-sample breakdown point, and its lower bound for the proposed MPCA are derived. The experimental results show that the proposed method has advantages over conventional methods.


2010 ◽  
Vol 08 (06) ◽  
pp. 995-1011 ◽  
Author(s):  
HAO ZHENG ◽  
HONGWEI WU

Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).


Author(s):  
Shazlyn Milleana Shaharudin ◽  
Norhaiza Ahmad ◽  
Siti Mariana Che Mat Nor

This paper presents a modified correlation in principal component analysis (PCA) for selection number of clusters in identifying rainfall patterns. The approach of a clustering as guided by PCA is extensively employed in data with high dimension especially in identifying the spatial distribution patterns of daily torrential rainfall. Typically, a common method of identifying rainfall patterns for climatological investigation employed T mode-based Pearson correlation matrix to extract the relative variance retained. However, the data of rainfall in Peninsular Malaysia involved skewed observations in the direction of higher values with pure tendencies of values that are positive. Therefore, using Pearson correlation which was basing on PCA on rainfall set of data has the potentioal to influence the partitions of cluster as well as producing exceptionally clusters that are eneven in a space with high dimension. For current research, to resolve the unbalanced clusters challenge regarding the patterns of rainfall caused by the skewed character of the data, a robust dimension reduction method in PCA was employed. Thus, it led to the introduction of a robust measure in PCA with Tukey’s biweight correlation to downweigh observations along with the optimal breakdown point to obtain PCA’s quantity of components. Outcomes of this study displayed a highly substantial progress for the robust PCA, contrasting with the PCA-based Pearson correlation in respects to the average amount of acquired clusters and indicated 70% variance cumulative percentage at the breakdown point of 0.4.


2016 ◽  
Vol 35 (2) ◽  
pp. 173-190 ◽  
Author(s):  
S. Shahid Shaukat ◽  
Toqeer Ahmed Rao ◽  
Moazzam A. Khan

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.


Author(s):  
Shazlyn Milleana Shaharudin ◽  
Shuhaida Ismail ◽  
Siti Mariana Che Mat Nor ◽  
Norhaiza Ahmad

<p>In this study, hybrid RPCA-spectral biclustering model is proposed in identifying the Peninsular Malaysia rainfall pattern. This model is a combination between Robust Principal Component Analysis (RPCA) and bi-clustering in order to overcome the skewness problem that existed in the Peninsular Malaysia rainfall data. The ability of Robust PCA is more resilient to outlier given that it assesses every observation and downweights the ones which deviate from the data center compared to classical PCA. Meanwhile, two way-clustering able to simultaneously cluster along two variables and exhibit a high correlation compared to one-way cluster analysis. The experimental results showed that the best cumulative percentage of variation in between 65% - 70% for both Robust and classical PCA. Meanwhile, the number of clusters has improved from six disjointed cluster in Robust PCA-kMeans to eight disjointed cluster for the proposed model. Further analysis shows that the proposed model has smaller variation with the values of 0.0034 compared to 0.030 in Robust PCA-kMeans model. Evident from this analysis, it is proven that the proposed RPCA-spectral biclustering model is predominantly acclimatized to the identifying rainfall patterns in Peninsular Malaysia due to the small variation of the clustering result.</p>


1990 ◽  
Vol 55 (1) ◽  
pp. 55-62 ◽  
Author(s):  
Drahomír Hnyk

The principal component analysis has been applied to a data matrix formed by 7 usual substituent constants for 38 substituents. Three factors are able to explain 99.4% cumulative proportion of total variance. Several rotations have been carried out for the first two factors in order to obtain their physical meaning. The first factor is related to the resonance effect, whereas the second one expresses the inductive effect, and both together describe 97.5% cumulative proportion of total variance. Their mutual orthogonality does not directly follow from the rotations carried out. With the help of these factors the substituents are divided into four main classes, and some of them assume a special position.


2017 ◽  
Vol 727 ◽  
pp. 447-449 ◽  
Author(s):  
Jun Dai ◽  
Hua Yan ◽  
Jian Jian Yang ◽  
Jun Jun Guo

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.


Sign in / Sign up

Export Citation Format

Share Document