A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

Abstract In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.

Download Full-text

Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0066 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Shofiqul Islam ◽

Sonia Anand ◽

Jemila Hamid ◽

Lehana Thabane ◽

Joseph Beyene

Keyword(s):

Principal Component Analysis ◽

Data Integration ◽

Principal Components ◽

Mirna Expression ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Sets ◽

Data Set ◽

Multiple Data Sets

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.

Download Full-text

Stormwater inflow prediction using radar rainfall data compressed by principal component analysis

Water Practice & Technology ◽

10.2166/wpt.2006.017 ◽

2006 ◽

Vol 1 (1) ◽

Author(s):

K. Katayama ◽

K. Kimijima ◽

O. Yamanaka ◽

A. Nagaiwa ◽

Y. Ono

Keyword(s):

Principal Component Analysis ◽

Prediction Model ◽

Principal Components ◽

Prediction Method ◽

Principal Component ◽

Component Analysis ◽

Rainfall Data ◽

Radar Rainfall ◽

Input Variables ◽

Inflow Prediction

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.

Download Full-text

Voxelwise Principal Component Analysis of Dynamic [S-Methyl-11C]Methionine PET Data in Glioma Patients

Cancers ◽

10.3390/cancers13102342 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2342

Author(s):

Corentin Martens ◽

Olivier Debeir ◽

Christine Decaestecker ◽

Thierry Metens ◽

Laetitia Lebrun ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Added Value ◽

Time Activity ◽

Positron Emission ◽

Activity Curve ◽

The Mean ◽

Mean Time ◽

Met Pet

Recent works have demonstrated the added value of dynamic amino acid positron emission tomography (PET) for glioma grading and genotyping, biopsy targeting, and recurrence diagnosis. However, most of these studies are based on hand-crafted qualitative or semi-quantitative features extracted from the mean time activity curve within predefined volumes. Voxelwise dynamic PET data analysis could instead provide a better insight into intra-tumor heterogeneity of gliomas. In this work, we investigate the ability of principal component analysis (PCA) to extract relevant quantitative features from a large number of motion-corrected [S-methyl-11C]methionine ([11C]MET) PET frames. We first demonstrate the robustness of our methodology to noise by means of numerical simulations. We then build a PCA model from dynamic [11C]MET acquisitions of 20 glioma patients. In a distinct cohort of 13 glioma patients, we compare the parametric maps derived from our PCA model to these provided by the classical one-compartment pharmacokinetic model (1TCM). We show that our PCA model outperforms the 1TCM to distinguish characteristic dynamic uptake behaviors within the tumor while being less computationally expensive and not requiring arterial sampling. Such methodology could be valuable to assess the tumor aggressiveness locally with applications for treatment planning and response evaluation. This work further supports the added value of dynamic over static [11C]MET PET in gliomas.

Download Full-text

The Design of Index about Corporate Governance Based on PCA Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.4085 ◽

2014 ◽

Vol 926-930 ◽

pp. 4085-4088

Author(s):

Chuan Jun Li

Keyword(s):

Principal Component Analysis ◽

Corporate Governance ◽

Principal Components ◽

Principal Component ◽

Component Analysis ◽

Contribution Rate ◽

Variance Contribution ◽

Pca Method

This article uses the PCA method (Principal component analysis) to evaluate the level of corporate governance. PCA is used to analyze the correlation among 10 original indicators, and extract some principal components so that most of the information of the original indicators is extracted. The formulation of the index of corporate governance can be got by calculating the weight based on the variance contribution rate of the principal component, which can comprehensively evaluate corporate governance.

Download Full-text

Identification of Liquor Brands Based on near Infrared Spectroscopy

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.834-836.935 ◽

2013 ◽

Vol 834-836 ◽

pp. 935-938

Author(s):

Lian Shun Zhang ◽

Chao Guo ◽

Bao Quan Wang

Keyword(s):

Principal Component Analysis ◽

Infrared Spectroscopy ◽

Near Infrared Spectroscopy ◽

Principal Components ◽

Near Infrared ◽

Scatter Correction ◽

Principal Component ◽

Component Analysis ◽

Correction Method ◽

Variance Contribution

In this paper, the liquor brands were identified based on the near infrared spectroscopy method and the principal component analysis. 60 samples of 6 different brands liquor were measured by the spectrometer of USB4000. Then, in order to eliminate the noise caused by the external factors, the smoothing method and the multiplicative scatter correction method were used. After the preprocessing, we got the revised spectra of the 60 samples. The difference of the spectrum shape of different brands is not much enough to classify them. So the principal component analysis was applied for further analysis. The results showed that the first two principal components variance contribution rate had reached 99.06%, which can effectively represent the information of the spectrums after preprocessing. From the scatter plot of the two principal components, the 6 different brands of liquor were identified more accurate and easier than the spectra curves.

Download Full-text

Grouping sunflower genotypes for yield, oil content, and reaction to Alternaria leaf spot using GGE biplot

Pesquisa Agropecuária Brasileira ◽

10.1590/s0100-204x2015000800003 ◽

2015 ◽

Vol 50 (8) ◽

pp. 649-657 ◽

Cited By ~ 2

Author(s):

Regina Maria Villas Bôas de Campos Leite ◽

Maria Cristina Neves de Oliveira

Keyword(s):

Principal Component Analysis ◽

Disease Severity ◽

Principal Components ◽

Leaf Spot ◽

Oil Content ◽

Principal Component ◽

Component Analysis ◽

High Yield ◽

Gge Biplot ◽

Alternaria Leaf Spot

Abstract:The objective of this work was to evaluate the suitability of the multivariate method of principal component analysis (PCA) using the GGE biplot software for grouping sunflower genotypes for their reaction to Alternaria leaf spot disease (Alternariaster helianthi), and for their yield and oil content. Sixty-nine genotypes were evaluated for disease severity in the field, at the R3 growth stage, in seven growing seasons, in Londrina, in the state of Paraná, Brazil, using a diagrammatic scale developed for this disease. Yield and oil content were also evaluated. Data were standardized using the software Statistica, and GGE biplot was used for PCA and graphical display of data. The first two principal components explained 77.9% of the total variation. According to the polygonal biplot using the first two principal components and three response variables, the genotypes were divided into seven sectors. Genotypes located on sectors 1 and 2 showed high yield and high oil content, respectively, and those located on sector 7 showed tolerance to the disease and high yield, despite the high disease severity. The principal component analysis using GGE biplot is an efficient method for grouping sunflower genotypes based on the studied variables.

Download Full-text

Principal component analysis for evaluating a ranking method used in the performance testing in sheep of Morada Nova breed

Semina Ciências Agrárias ◽

10.5433/1679-0359.2015v36n6p3909 ◽

2015 ◽

Vol 36 (6) ◽

pp. 3909

Author(s):

Michelle Santos da Silva ◽

Luciana Shiotsuki ◽

Raimundo Nonato Braga Lôbo ◽

Olivardo Facó

Keyword(s):

Principal Component Analysis ◽

Principal Components ◽

Correlation Coefficients ◽

Performance Testing ◽

Principal Component ◽

Component Analysis ◽

Ranking Method ◽

Daily Weight Gain ◽

Meat Production ◽

Body Depth

A multivariate approach was adopted to evaluate the relationship among traits measured in the performance testing of Morada Nova sheep, verify the efficiency of a ranking method used in these tests and identify the most significant traits for use in future analyses. Data from 150 young rams participating in five versions of the performance tests for the Morada Nova breed were used. Twenty traits were measured in each animal: initial weight (IW), final weight (FW), average daily weight gain (ADG), loin eye area (LEA), scrotal circumference (SC), fat thickness (FT), conformation (C), precocity (Pc), muscularity (M), breed features (BF), legs (L), withers height (WH), chest width (CW), rump height (RH), rump width (RW), rump length (RL), body length (BL), body depth (BD), heart girth (HG) and body condition scoring (BCS). The Pearson’s correlation coefficients ranged from –0.10 to 0.93, with the highest correlations were between body weight variables and morphometric measurements. The three first principal components explained 72.28% of the total variability among all traits. The variables related to animal size defined the first principal component, whereas those related to visual appraisal and suitability for meat production defined the second and third principal components, respectively. The combination of traits from the principal component analysis showed that the ranking method currently used in the performance testing of Morada Nova sheep is efficient for selecting larger rams with better breed features and higher degrees of specialization for meat production.

Download Full-text

Hyperspectral data compression based upon the principal component analysis

Computer Optics ◽

10.18287/2412-6179-co-806 ◽

2021 ◽

Vol 45 (2) ◽

pp. 235-244

Author(s):

A.S. Minkin ◽

O.V. Nikolaeva ◽

A.A. Russkov

Keyword(s):

Principal Component Analysis ◽

Data Compression ◽

Principal Component ◽

Component Analysis ◽

Hyperspectral Data ◽

Gas Absorption ◽

Absorption Bands ◽

Spectral Bands ◽

The Mean ◽

Retrieval Errors

The paper is aimed at developing an algorithm of hyperspectral data compression that combines small losses with high compression rate. The algorithm relies on a principal component analysis and a method of exhaustion. The principal components are singular vectors of an initial signal matrix, which are found by the method of exhaustion. A retrieved signal matrix is formed in parallel. The process continues until a required retrieval error is attained. The algorithm is described in detail and input and output parameters are specified. Testing is performed using AVIRIS data (Airborne Visible-Infrared Imaging Spectrometer). Three images of differently looking sky (clear sky, partly clouded sky, and overcast skies) are analyzed. For each image, testing is performed for all spectral bands and for a set of bands from which high water-vapour absorption bands are excluded. Retrieval errors versus compression rates are presented. The error formulas include the root mean square deviation, the noise-to-signal ratio, the mean structural similarity index, and the mean relative deviation. It is shown that the retrieval errors decrease by more than an order of magnitude if spectral bands with high gas absorption are disregarded. It is shown that the reason is that weak signals in the absorption bands are measured with great errors, leading to a weak dependence between the spectra in different spatial pixels. A mean cosine distance between the spectra in different spatial pixels is suggested to be used to assess the image compressibility.

Download Full-text

Measuring the Systematic Risk of Sectors within the US Market Via Principal Components Analysis: Before and during the COVID-19 Pandemic

10.5772/intechopen.101860 ◽

2022 ◽

Author(s):

Jaime González Maiz Jiménez ◽

Adán Reyes Santiago

Keyword(s):

Principal Component Analysis ◽

Stock Market ◽

Principal Components Analysis ◽

Principal Components ◽

Systematic Risk ◽

Principal Component ◽

Component Analysis ◽

Market Capitalization ◽

The Us ◽

Components Analysis

This research measures the systematic risk of 10 sectors in the American Stock Market, discerning the COVID-19 pandemic period. The novelty of this study is the use of the Principal Component Analysis (PCA) technique to measure the systematic risk of each sector, selecting five stocks per sector with the greatest market capitalization. The results show that the sectors that have the greatest increase in exposure to systematic risk during the pandemic are restaurants, clothing, and insurance, whereas the sectors that show the greatest decrease in terms of exposure to systematic risk are automakers and tobacco. Due to the results of this study, it seems advisable for practitioners to select stocks that belong to either the automakers or tobacco sector to get protection from health crises, such as COVID-19.

Download Full-text

Competitive and Recreational Running Kinematics Examined Using Principal Components Analysis

Healthcare ◽

10.3390/healthcare9101321 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1321

Author(s):

Wenjing Quan ◽

Huiyu Zhou ◽

Datao Xu ◽

Shudong Li ◽

Julien S. Baker ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Components ◽

Sagittal Plane ◽

Three Dimensional ◽

Frontal Plane ◽

Principal Component ◽

Component Analysis ◽

Ankle Inversion ◽

Motion System ◽

Recreational Runners

Kinematics data are primary biomechanical parameters. A principal component analysis (PCA) of waveforms is a statistical approach used to explore patterns of variability in biomechanical curve datasets. Differences in experienced and recreational runners’ kinematic variables are still unclear. The purpose of the present study was to compare any differences in kinematics parameters for competitive runners and recreational runners using principal component analysis in the sagittal plane, frontal plane and transverse plane. Forty male runners were divided into two groups: twenty competitive runners and twenty recreational runners. A Vicon Motion System (Vicon Metrics Ltd., Oxford, UK) captured three-dimensional kinematics data during running at 3.3 m/s. The principal component analysis was used to determine the dominating variation in this model. Then, the principal component scores retained the first three principal components and were analyzed using independent t-tests. The recreational runners were found to have a smaller dorsiflexion angle, initial dorsiflexion contact angle, ankle inversion, knee adduction, range motion in the frontal knee plane and hip frontal plane. The running kinematics data were influenced by running experience. The findings from the study provide a better understanding of the kinematics variables for competitive and recreational runners. Thus, these findings might have implications for reducing running injury and improving running performance.

Download Full-text