A Blockchain-Integrated Divided-Block Sparse Matrix Transformation Differential Privacy Data Publishing Model

With the rapid development of information technology, people benefit more and more from big data. At the same time, it becomes a great concern that how to obtain optimal outputs from big data publishing and sharing management while protecting privacy. Many researchers seek to realize differential privacy protection in massive high-dimensional datasets using the method of principal component analysis. However, these algorithms are inefficient in processing and do not take into account the different privacy protection needs of each attribute in high-dimensional datasets. To address the above problem, we design a Divided-block Sparse Matrix Transformation Differential Privacy Data Publishing Algorithm (DSMT-DP). In this algorithm, different levels of privacy budget parameters are assigned to different attributes according to the required privacy protection level of each attribute, taking into account the privacy protection needs of different levels of attributes. Meanwhile, the use of the divided-block scheme and the sparse matrix transformation scheme can improve the computational efficiency of the principal component analysis method for handling large amounts of high-dimensional sensitive data, and we demonstrate that the proposed algorithm satisfies differential privacy. Our experimental results show that the mean square error of the proposed algorithm is smaller than the traditional differential privacy algorithm with the same privacy parameters, and the computational efficiency can be improved. Further, we combine this algorithm with blockchain and propose an Efficient Privacy Data Publishing and Sharing Model based on the blockchain. Publishing and sharing private data on this model not only resist strong background knowledge attacks from adversaries outside the system but also prevent stealing and tampering of data by not-completely-honest participants inside the system.

Download Full-text

Differential Privacy Principal Component Analysis for Support Vector Machines

Security and Communication Networks ◽

10.1155/2021/5542283 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yuxian Huang ◽

Geng Yang ◽

Yahong Xu ◽

Hao Zhou

Keyword(s):

Principal Component Analysis ◽

Classification Accuracy ◽

Differential Privacy ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Fast Classification

In big data era, massive and high-dimensional data is produced at all times, increasing the difficulty of analyzing and protecting data. In this paper, in order to realize dimensionality reduction and privacy protection of data, principal component analysis (PCA) and differential privacy (DP) are combined to handle these data. Moreover, support vector machine (SVM) is used to measure the availability of processed data in our paper. Specifically, we introduced differential privacy mechanisms at different stages of the algorithm PCA-SVM and obtained the algorithms DPPCA-SVM and PCADP-SVM. Both algorithms satisfy ε , 0 -DP while achieving fast classification. In addition, we evaluate the performance of two algorithms in terms of noise expectation and classification accuracy from the perspective of theoretical proof and experimental verification. To verify the performance of DPPCA-SVM, we also compare our DPPCA-SVM with other algorithms. Results show that DPPCA-SVM provides excellent utility for different data sets despite guaranteeing stricter privacy.

Download Full-text

Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis

Statistical Papers ◽

10.1007/s00362-018-1045-6 ◽

2018 ◽

Vol 61 (6) ◽

pp. 2283-2311

Author(s):

Heewon Park ◽

Sadanori Konishi

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Common Component ◽

High Dimensional Datasets

Download Full-text

Principal Component Analysis in the Local Differential Privacy Model

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/666 ◽

2019 ◽

Cited By ~ 1

Author(s):

Di Wang ◽

Jinhui Xu

Keyword(s):

Principal Component Analysis ◽

Differential Privacy ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Minimax Risk ◽

Privacy Model ◽

Real World Datasets ◽

Low Dimensional ◽

Subspace Distance

In this paper, we study the Principal Component Analysis (PCA) problem under the (distributed) non-interactive local differential privacy model. For the low dimensional case, we show the optimal rate for the private minimax risk of the k-dimensional PCA using the squared subspace distance as the measurement. For the high dimensional row sparse case, we first give a lower bound on the private minimax risk, . Then we provide an efficient algorithm to achieve a near optimal upper bound. Experiments on both synthetic and real world datasets confirm the theoretical guarantees of our algorithms.

Download Full-text

Criteria for choosing the number of dimensions in a principal component analysis: An empirical assessment

10.5753/sbbd.2020.13632 ◽

2020 ◽

Author(s):

Renata Silva ◽

Daniel Oliveira ◽

Davi Pereira Santos ◽

Lucio F.D. Santos ◽

Rodrigo Erthal Wilson ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Hypothesis Test ◽

Feature Learning ◽

Principal Component ◽

Component Analysis ◽

Scree Plot ◽

Open Issue ◽

Chained Tasks ◽

High Dimensional Datasets

Principal component analysis (PCA) is an efficient model for the optimization problem of finding d' axes of a subspace Rd' ⊆ Rd so that the mean squared distances from a given set R of points to the axes are minimal. Despite being steadily employed since 1901 in different scenarios, e.g., mechanics, PCA has become an important link in machine learning chained tasks, such as feature learning and AutoML designs. A frequent yet open issue that arises from supervised-based problems is how many PCA axes are required for the performance of machine learning constructs to be tuned. Accordingly, we investigate the behavior of six independent and uncoupled criteria for estimating the number of PCA axes, namely Scree-Plot %, Scree Plot Gap, Kaiser-Guttman, Broken-Stick, p-Score, and 2D. In total, we evaluate the performance of those approaches in 20 high dimensional datasets by using (i) four different classifiers, and (ii) a hypothesis test upon the reported F-Measures. Results indicate Broken-Stick and Scree-Plot % criteria consistently outperformed the competitors regarding supervised-based tasks, whereas estimators Kaiser-Guttman and Scree-Plot Gap delivered poor performances in the same scenarios.

Download Full-text

Differentially Private Autocorrelation Time-Series Data Publishing Based on Sliding Window

Security and Communication Networks ◽

10.1155/2021/6665984 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jing Zhao ◽

Shubo Liu ◽

Xingxing Xiong ◽

Zhaohui Cai

Keyword(s):

Time Series ◽

Privacy Protection ◽

Large Scale ◽

Differential Privacy ◽

Time Series Data ◽

Sliding Window ◽

Data Publishing ◽

Series Data ◽

Data Publication ◽

Autocorrelation Time

Privacy protection is one of the major obstacles for data sharing. Time-series data have the characteristics of autocorrelation, continuity, and large scale. Current research on time-series data publication mainly ignores the correlation of time-series data and the lack of privacy protection. In this paper, we study the problem of correlated time-series data publication and propose a sliding window-based autocorrelation time-series data publication algorithm, called SW-ATS. Instead of using global sensitivity in the traditional differential privacy mechanisms, we proposed periodic sensitivity to provide a stronger degree of privacy guarantee. SW-ATS introduces a sliding window mechanism, with the correlation between the noise-adding sequence and the original time-series data guaranteed by sequence indistinguishability, to protect the privacy of the latest data. We prove that SW-ATS satisfies ε-differential privacy. Compared with the state-of-the-art algorithm, SW-ATS is superior in reducing the error rate of MAE which is about 25%, improving the utility of data, and providing stronger privacy protection.

Download Full-text

Dimensionality and Its Reduction

Statistics, Data Mining, and Machine Learning in Astronomy ◽

10.23943/princeton/9780691151687.003.0007 ◽

2014 ◽

Author(s):

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

Alexander Gray ◽

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Reduction Technique ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Gaussian Distributions ◽

Dimensionality Reduction Technique ◽

Alternative Techniques ◽

New Generation

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.

Download Full-text

Using principal component analysis for neural network high-dimensional potential energy surface

The Journal of Chemical Physics ◽

10.1063/5.0009264 ◽

2020 ◽

Vol 152 (23) ◽

pp. 234103

Author(s):

Bastien Casier ◽

Stéphane Carniato ◽

Tsveta Miteva ◽

Nathalie Capron ◽

Nicolas Sisourat

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Potential Energy ◽

Potential Energy Surface ◽

Energy Surface ◽

Principal Component ◽

Component Analysis ◽

High Dimensional

Download Full-text

High-Dimensional Data Dimension Reduction Based on KECA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1101 ◽

2013 ◽

Vol 303-306 ◽

pp. 1101-1104 ◽

Cited By ~ 2

Author(s):

Yong De Hu ◽

Jing Chang Pan ◽

Xin Tan

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

High Dimensional Data ◽

Principal Component ◽

Good Method ◽

Component Analysis ◽

Renyi Entropy ◽

Rényi Entropy ◽

Kernel Principal Component Analysis ◽

High Dimensional

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.

Download Full-text

Principal Component Analysis (PCA) for high-dimensional data. PCA is dead. Long live PCA

Perspectives on Big Data Analysis - Contemporary Mathematics ◽

10.1090/conm/622/12430 ◽

2014 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Fan Yang ◽

Kjell Doksum ◽

Kam-Wah Tsui

Keyword(s):

Principal Component Analysis ◽

High Dimensional Data ◽

Principal Component ◽

Component Analysis ◽

High Dimensional

Download Full-text

Modified “Rule N” Procedure for Principal Component (EOF) Truncation

Journal of Climate ◽

10.1175/jcli-d-15-0812.1 ◽

2016 ◽

Vol 29 (8) ◽

pp. 3049-3056 ◽

Cited By ~ 3

Author(s):

Daniel S. Wilks

Keyword(s):

Hypothesis Test ◽

Principal Component ◽

Real Data ◽

High Dimensional ◽

Eof Analysis ◽

Artificial Data ◽

Critical Question ◽

Orthogonal Function ◽

High Dimensional Datasets ◽

Statistical Results

Abstract Principal component analysis (PCA), also known as empirical orthogonal function (EOF) analysis, is widely used for compression of high-dimensional datasets in such applications as climate diagnostics and seasonal forecasting. A critical question when using this method is the number of modes, representing meaningful signal, to retain. The resampling-based “Rule N” method attempts to address the question of PCA truncation in a statistically principled manner. However, it is only valid for the leading (largest) eigenvalue, because it fails to condition the hypothesis tests for subsequent (smaller) eigenvalues on the results of previous tests. This paper draws on several relatively recent statistical results to construct a hypothesis-test-based truncation rule that accounts at each stage for the magnitudes of the larger eigenvalues. The performance of the method is demonstrated in an artificial data setting and illustrated with a real-data example.

Download Full-text