scholarly journals A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

2017 ◽  
Vol 33 (1) ◽  
pp. 15-41 ◽  
Author(s):  
Aida Calviño

Abstract In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.

Author(s):  
Shofiqul Islam ◽  
Sonia Anand ◽  
Jemila Hamid ◽  
Lehana Thabane ◽  
Joseph Beyene

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2342
Author(s):  
Corentin Martens ◽  
Olivier Debeir ◽  
Christine Decaestecker ◽  
Thierry Metens ◽  
Laetitia Lebrun ◽  
...  

Recent works have demonstrated the added value of dynamic amino acid positron emission tomography (PET) for glioma grading and genotyping, biopsy targeting, and recurrence diagnosis. However, most of these studies are based on hand-crafted qualitative or semi-quantitative features extracted from the mean time activity curve within predefined volumes. Voxelwise dynamic PET data analysis could instead provide a better insight into intra-tumor heterogeneity of gliomas. In this work, we investigate the ability of principal component analysis (PCA) to extract relevant quantitative features from a large number of motion-corrected [S-methyl-11C]methionine ([11C]MET) PET frames. We first demonstrate the robustness of our methodology to noise by means of numerical simulations. We then build a PCA model from dynamic [11C]MET acquisitions of 20 glioma patients. In a distinct cohort of 13 glioma patients, we compare the parametric maps derived from our PCA model to these provided by the classical one-compartment pharmacokinetic model (1TCM). We show that our PCA model outperforms the 1TCM to distinguish characteristic dynamic uptake behaviors within the tumor while being less computationally expensive and not requiring arterial sampling. Such methodology could be valuable to assess the tumor aggressiveness locally with applications for treatment planning and response evaluation. This work further supports the added value of dynamic over static [11C]MET PET in gliomas.


2014 ◽  
Vol 926-930 ◽  
pp. 4085-4088
Author(s):  
Chuan Jun Li

This article uses the PCA method (Principal component analysis) to evaluate the level of corporate governance. PCA is used to analyze the correlation among 10 original indicators, and extract some principal components so that most of the information of the original indicators is extracted. The formulation of the index of corporate governance can be got by calculating the weight based on the variance contribution rate of the principal component, which can comprehensively evaluate corporate governance.


2013 ◽  
Vol 834-836 ◽  
pp. 935-938
Author(s):  
Lian Shun Zhang ◽  
Chao Guo ◽  
Bao Quan Wang

In this paper, the liquor brands were identified based on the near infrared spectroscopy method and the principal component analysis. 60 samples of 6 different brands liquor were measured by the spectrometer of USB4000. Then, in order to eliminate the noise caused by the external factors, the smoothing method and the multiplicative scatter correction method were used. After the preprocessing, we got the revised spectra of the 60 samples. The difference of the spectrum shape of different brands is not much enough to classify them. So the principal component analysis was applied for further analysis. The results showed that the first two principal components variance contribution rate had reached 99.06%, which can effectively represent the information of the spectrums after preprocessing. From the scatter plot of the two principal components, the 6 different brands of liquor were identified more accurate and easier than the spectra curves.


2015 ◽  
Vol 50 (8) ◽  
pp. 649-657 ◽  
Author(s):  
Regina Maria Villas Bôas de Campos Leite ◽  
Maria Cristina Neves de Oliveira

Abstract:The objective of this work was to evaluate the suitability of the multivariate method of principal component analysis (PCA) using the GGE biplot software for grouping sunflower genotypes for their reaction to Alternaria leaf spot disease (Alternariaster helianthi), and for their yield and oil content. Sixty-nine genotypes were evaluated for disease severity in the field, at the R3 growth stage, in seven growing seasons, in Londrina, in the state of Paraná, Brazil, using a diagrammatic scale developed for this disease. Yield and oil content were also evaluated. Data were standardized using the software Statistica, and GGE biplot was used for PCA and graphical display of data. The first two principal components explained 77.9% of the total variation. According to the polygonal biplot using the first two principal components and three response variables, the genotypes were divided into seven sectors. Genotypes located on sectors 1 and 2 showed high yield and high oil content, respectively, and those located on sector 7 showed tolerance to the disease and high yield, despite the high disease severity. The principal component analysis using GGE biplot is an efficient method for grouping sunflower genotypes based on the studied variables.


2015 ◽  
Vol 36 (6) ◽  
pp. 3909
Author(s):  
Michelle Santos da Silva ◽  
Luciana Shiotsuki ◽  
Raimundo Nonato Braga Lôbo ◽  
Olivardo Facó

A multivariate approach was adopted to evaluate the relationship among traits measured in the performance testing of Morada Nova sheep, verify the efficiency of a ranking method used in these tests and identify the most significant traits for use in future analyses. Data from 150 young rams participating in five versions of the performance tests for the Morada Nova breed were used. Twenty traits were measured in each animal: initial weight (IW), final weight (FW), average daily weight gain (ADG), loin eye area (LEA), scrotal circumference (SC), fat thickness (FT), conformation (C), precocity (Pc), muscularity (M), breed features (BF), legs (L), withers height (WH), chest width (CW), rump height (RH), rump width (RW), rump length (RL), body length (BL), body depth (BD), heart girth (HG) and body condition scoring (BCS). The Pearson’s correlation coefficients ranged from –0.10 to 0.93, with the highest correlations were between body weight variables and morphometric measurements. The three first principal components explained 72.28% of the total variability among all traits. The variables related to animal size defined the first principal component, whereas those related to visual appraisal and suitability for meat production defined the second and third principal components, respectively. The combination of traits from the principal component analysis showed that the ranking method currently used in the performance testing of Morada Nova sheep is efficient for selecting larger rams with better breed features and higher degrees of specialization for meat production.


2021 ◽  
Vol 45 (2) ◽  
pp. 235-244
Author(s):  
A.S. Minkin ◽  
O.V. Nikolaeva ◽  
A.A. Russkov

The paper is aimed at developing an algorithm of hyperspectral data compression that combines small losses with high compression rate. The algorithm relies on a principal component analysis and a method of exhaustion. The principal components are singular vectors of an initial signal matrix, which are found by the method of exhaustion. A retrieved signal matrix is formed in parallel. The process continues until a required retrieval error is attained. The algorithm is described in detail and input and output parameters are specified. Testing is performed using AVIRIS data (Airborne Visible-Infrared Imaging Spectrometer). Three images of differently looking sky (clear sky, partly clouded sky, and overcast skies) are analyzed. For each image, testing is performed for all spectral bands and for a set of bands from which high water-vapour absorption bands are excluded. Retrieval errors versus compression rates are presented. The error formulas include the root mean square deviation, the noise-to-signal ratio, the mean structural similarity index, and the mean relative deviation. It is shown that the retrieval errors decrease by more than an order of magnitude if spectral bands with high gas absorption are disregarded. It is shown that the reason is that weak signals in the absorption bands are measured with great errors, leading to a weak dependence between the spectra in different spatial pixels. A mean cosine distance between the spectra in different spatial pixels is suggested to be used to assess the image compressibility.


2022 ◽  
Author(s):  
Jaime González Maiz Jiménez ◽  
Adán Reyes Santiago

This research measures the systematic risk of 10 sectors in the American Stock Market, discerning the COVID-19 pandemic period. The novelty of this study is the use of the Principal Component Analysis (PCA) technique to measure the systematic risk of each sector, selecting five stocks per sector with the greatest market capitalization. The results show that the sectors that have the greatest increase in exposure to systematic risk during the pandemic are restaurants, clothing, and insurance, whereas the sectors that show the greatest decrease in terms of exposure to systematic risk are automakers and tobacco. Due to the results of this study, it seems advisable for practitioners to select stocks that belong to either the automakers or tobacco sector to get protection from health crises, such as COVID-19.


Healthcare ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1321
Author(s):  
Wenjing Quan ◽  
Huiyu Zhou ◽  
Datao Xu ◽  
Shudong Li ◽  
Julien S. Baker ◽  
...  

Kinematics data are primary biomechanical parameters. A principal component analysis (PCA) of waveforms is a statistical approach used to explore patterns of variability in biomechanical curve datasets. Differences in experienced and recreational runners’ kinematic variables are still unclear. The purpose of the present study was to compare any differences in kinematics parameters for competitive runners and recreational runners using principal component analysis in the sagittal plane, frontal plane and transverse plane. Forty male runners were divided into two groups: twenty competitive runners and twenty recreational runners. A Vicon Motion System (Vicon Metrics Ltd., Oxford, UK) captured three-dimensional kinematics data during running at 3.3 m/s. The principal component analysis was used to determine the dominating variation in this model. Then, the principal component scores retained the first three principal components and were analyzed using independent t-tests. The recreational runners were found to have a smaller dorsiflexion angle, initial dorsiflexion contact angle, ankle inversion, knee adduction, range motion in the frontal knee plane and hip frontal plane. The running kinematics data were influenced by running experience. The findings from the study provide a better understanding of the kinematics variables for competitive and recreational runners. Thus, these findings might have implications for reducing running injury and improving running performance.


Sign in / Sign up

Export Citation Format

Share Document