PRIVACY-PRESERVING RANDOM PROJECTION-BASED RECOMMENDATIONS BASED ON DISTRIBUTED DATA

Providing recommendations based on distributed data has received an increasing amount of attention because it offers several advantages. Online vendors who face problems caused by a limited amount of available data want to offer predictions based on distributed data collaboratively because they can surmount problems such as cold start, limited coverage, and unsatisfactory accuracy through partnerships. It is relatively easy to produce referrals based on distributed data when privacy is not a concern. However, concerns regarding the protection of private data, financial fears due to revealing valuable assets, and legal regulations imposed by various organizations prevent companies from forming collaborations. In this study, we propose to use random projection to protect online vendors' privacy while still providing accurate predictions from distributed data without sacrificing online performance. We utilize random projection to eliminate the aforementioned issues so vendors can work in partnerships. We suggest privacy-preserving schemes to offer recommendations based on vertically or horizontally partitioned data among multiple companies. The recommended methods are analyzed in terms of confidentiality. We also analyze the superfluous loads caused by privacy concerns. Finally, we perform real data-based trials to evaluate the accuracy of the proposed schemes. The results of our analyses show that our methods preserve privacy, cause insignificant overheads, and offer accurate predictions.

Download Full-text

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Cybernetics and Information Technologies ◽

10.1515/cait-2017-0015 ◽

2017 ◽

Vol 17 (2) ◽

pp. 44-55 ◽

Cited By ~ 1

Author(s):

M. Antony Sheela ◽

K. Vijayalakshmi

Keyword(s):

Data Mining ◽

Threshold Level ◽

Third Party ◽

Distributed Data Mining ◽

Distributed Data ◽

Data Perturbation ◽

Private Data ◽

Partitioned Data ◽

Vertically Partitioned Data ◽

The Mean

Abstract Data mining on vertically or horizontally partitioned dataset has the overhead of protecting the private data. Perturbation is a technique that protects the revealing of data. This paper proposes a perturbation and anonymization technique that is performed on the vertically partitioned data. A third-party coordinator is used to partition the data recursively in various parties. The parties perturb the data by finding the mean, when the specified threshold level is reached. The perturbation maintains the statistical relationship among attributes.

Download Full-text

Privacy-Preserving Restricted Boltzmann Machine

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/138498 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7

Author(s):

Yu Li ◽

Yuan Zhang ◽

Yue Ji

Keyword(s):

Data Privacy ◽

Efficiency Analysis ◽

Privacy Preserving ◽

Restricted Boltzmann Machine ◽

Distributed Data Mining ◽

Distributed Data ◽

Boltzmann Machine ◽

Crucial Issue ◽

Private Data ◽

Technology Revolution

With the arrival of the big data era, it is predicted that distributed data mining will lead to an information technology revolution. To motivate different institutes to collaborate with each other, the crucial issue is to eliminate their concerns regarding data privacy. In this paper, we propose a privacy-preserving method for training a restricted boltzmann machine (RBM). The RBM can be got without revealing their private data to each other when using our privacy-preserving method. We provide a correctness and efficiency analysis of our algorithms. The comparative experiment shows that the accuracy is very close to the original RBM model.

Download Full-text

Random projection-based multiplicative data perturbation for privacy preserving distributed data mining

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2006.14 ◽

2006 ◽

Vol 18 (1) ◽

pp. 92-106 ◽

Cited By ~ 275

Author(s):

Kun Liu ◽

H. Kargupta ◽

J. Ryan

Keyword(s):

Data Mining ◽

Random Projection ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data ◽

Data Perturbation

Download Full-text

Privacy-preserving Maximum Likelihood Estimation for Distributed Data

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v1i2.574 ◽

2010 ◽

Vol 1 (2) ◽

Cited By ~ 1

Author(s):

Xiaodong Lin ◽

Alan F. Karr

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Data Privacy ◽

Likelihood Estimation ◽

Privacy Preserving ◽

Distributed Data ◽

Distributed Environment ◽

Technological Advances ◽

Clustering And Classification ◽

Partitioned Data

Recent technological advances enable the collection of huge amounts of data. Commonly, these data are generated, stored, and owned by multiple entities that are unwilling to cede control of their data. This distributed environment requires statistical tools that can produce correct results while preserving data privacy. Privacy-preserving protocols have been proposed to solve specific statistical analysis such as linear regression, clustering, and classification. In this paper, we present methods and protocols for privacy-preserving maximum likelihood estimation in general settings. We discuss both horizontally and vertically partitioned data, and propose procedures that allow participating parties to withdraw from the joint computation. Logistic regression is used to demonstrate our method.

Download Full-text

PRIVACY-PRESERVING OLAP FOR ACCURATE ANSWER

Journal of Circuits System and Computers ◽

10.1142/s0218126612500090 ◽

2012 ◽

Vol 21 (01) ◽

pp. 1250009 ◽

Cited By ~ 1

Author(s):

YOUWEN ZHU ◽

LIUSHENG HUANG ◽

TSUYOSHI TAKAGI ◽

MINGWU ZHANG

Keyword(s):

Linear Complexity ◽

Privacy Preserving ◽

Sensitive Information ◽

Computation Complexity ◽

Privacy Concerns ◽

Partitioned Data ◽

Aggregate Query ◽

Analytical Processing ◽

Vertically Partitioned Data ◽

Security Communication

Recently, growing privacy concerns have received more and more attention and it becomes a significant topic on how to preserve private-sensitive information from being violated in distributed cooperative computation. In this paper, we first propose a novel-general privacy-preserving online analytical processing model based on secure multiparty computation. Then, based on the new model, two schemes to privacy-preserving count aggregate query over both horizontally partitioned data and vertically partitioned data are proposed. Additionally, we also propose several efficient subprotocols that serve as the basic secure buildings. Furthermore, we analyze correctness, security, communication cost, and computation complexity of our proposed protocols, and show that the new schemes are secure, having good linear complexity and that the query results are exactly accurate.

Download Full-text

Business Collaboration by Privacy-Preserving Clustering

Social Implications of Data Mining and Information Privacy ◽

10.4018/978-1-60566-196-4.ch007 ◽

2010 ◽

pp. 113-133

Author(s):

Stanley R.M. Oliveira ◽

Osmar R. Zaïane

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Privacy Preservation ◽

Clustering Algorithms ◽

Random Projection ◽

Privacy Preserving ◽

Mathematical Foundation ◽

Privacy Concerns ◽

Privacy Requirements ◽

Data Owner

The sharing of data is beneficial in data mining applications and widely acknowledged as advantageous in business. However, information sharing can become controversial and thwarted by privacy regulations and other privacy concerns. Rather than simply hindering data owners from sharing information for data analysis, a solution could be designed to meet privacy requirements and guarantee valid data clustering results. To achieve this dual goal, this chapter introduces a method for privacy-preserving clustering, called Dimensionality Reduction-Based Transformation (DRBT). This method relies on the intuition behind random projection to protect the underlying attribute values subjected to cluster analysis. It is shown analytically and empirically that transforming a dataset using DRBT, a data owner can achieve privacy preservation and get accurate clustering with little overhead of communication cost. The advantages of such a method are: it is independent of distance-based clustering algorithms; it has a sound mathematical foundation; and it does not require CPU-intensive operations.

Download Full-text

PPDM-TAN: A Privacy-Preserving Multi-Party Classifier

Computation ◽

10.3390/computation9010006 ◽

2021 ◽

Vol 9 (1) ◽

pp. 6

Author(s):

Maria Eleni Skarkala ◽

Manolis Maragoudakis ◽

Stefanos Gritzalis ◽

Lilian Mitrou

Keyword(s):

Data Mining ◽

Private Information ◽

Security Analysis ◽

Privacy Preserving ◽

Data Mining Algorithm ◽

Bayes Classifier ◽

Sensitive Data ◽

Privacy Concerns ◽

Information Privacy Concerns ◽

Private Data

Distributed medical, financial, or social databases are analyzed daily for the discovery of patterns and useful information. Privacy concerns have emerged as some database segments contain sensitive data. Data mining techniques are used to parse, process, and manage enormous amounts of data while ensuring the preservation of private information. Cryptography, as shown by previous research, is the most accurate approach to acquiring knowledge while maintaining privacy. In this paper, we present an extension of a privacy-preserving data mining algorithm, thoroughly designed and developed for both horizontally and vertically partitioned databases, which contain either nominal or numeric attribute values. The proposed algorithm exploits the multi-candidate election schema to construct a privacy-preserving tree-augmented naive Bayesian classifier, a more robust variation of the classical naive Bayes classifier. The exploitation of the Paillier cryptosystem and the distinctive homomorphic primitive shows in the security analysis that privacy is ensured and the proposed algorithm provides strong defences against common attacks. Experiments deriving the benefits of real world databases demonstrate the preservation of private data while mining processes occur and the efficient handling of both database partition types.

Download Full-text

Using Cryptography For Privacy-Preserving Data Mining

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-60566-218-3.ch014 ◽

2008 ◽

pp. 175-194

Author(s):

Justin Zhan

Keyword(s):

Data Mining ◽

Data Privacy ◽

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor ◽

Privacy Concerns ◽

Private Data ◽

Definition Of ◽

Types Of Information ◽

Neighbor Classification

To conduct data mining, we often need to collect data from various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The goal of this paper is to provide solutions for privacy-preserving k-nearest neighbor classification which is one of data mining tasks. Our goal is to obtain accurate data mining results without disclosing private data. We propose a formal definition of privacy and show that our solutions preserve data privacy.

Download Full-text

PRIVACY-PRESERVING SVD-BASED COLLABORATIVE FILTERING ON PARTITIONED DATA

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003919 ◽

2010 ◽

Vol 09 (03) ◽

pp. 473-502 ◽

Cited By ~ 23

Author(s):

IBRAHIM YAKUT ◽

HUSEYIN POLAT

Keyword(s):

Singular Value Decomposition ◽

Collaborative Filtering ◽

Real Time ◽

Real Data ◽

Privacy Preserving ◽

Singular Value ◽

Competitive Edge ◽

Partitioned Data ◽

Overall Performance ◽

Value Decomposition

Collaborative filtering (CF) systems are widely employed by many e-commerce sites for providing recommendations to their customers. To recruit new customers, retain the current ones, and gain competitive edge over competing companies, online vendors need to offer accurate predictions efficiently. Therefore, providing precise recommendations efficiently to many users in real time is imperative. Singular value decomposition (SVD) is applied to CF to achieve such goal. SVD-based CF systems offer reliable and accurate predictions when they own large enough data. Data collected for CF purposes, however, might be split between different companies, even competing ones. Some vendors, especially newly established ones, might have problems with available data. To increase mutual advantages, provide richer CF services, and overcome problems caused by inadequate data, companies want to integrate their data. However, due to privacy, legal, and financial reasons, they do not want to combine their data. In this article, we investigate how to provide SVD-based referrals on partitioned (horizontally or vertically) data without greatly jeopardizing data holders' privacy. We conduct real data-based experiments to assess our schemes' overall performance and analyze them in terms of privacy and supplementary costs. Our results show that it is possible to provide accurate SVD-based referrals on integrated data while preserving e-companies' privacy.

Download Full-text

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques

2021 7th International Conference on Electrical Energy Systems (ICEES) ◽

10.1109/icees51510.2021.9383673 ◽

2021 ◽

Author(s):

Venkatesh Kumar Marimuthu ◽

C. Lakshmi

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data ◽

Cryptographic Techniques

Download Full-text