PRIVACY-PRESERVING RANDOM PROJECTION-BASED RECOMMENDATIONS BASED ON DISTRIBUTED DATA

2013 ◽  
Vol 12 (02) ◽  
pp. 201-232 ◽  
Author(s):  
CIHAN KALELI ◽  
HUSEYIN POLAT

Providing recommendations based on distributed data has received an increasing amount of attention because it offers several advantages. Online vendors who face problems caused by a limited amount of available data want to offer predictions based on distributed data collaboratively because they can surmount problems such as cold start, limited coverage, and unsatisfactory accuracy through partnerships. It is relatively easy to produce referrals based on distributed data when privacy is not a concern. However, concerns regarding the protection of private data, financial fears due to revealing valuable assets, and legal regulations imposed by various organizations prevent companies from forming collaborations. In this study, we propose to use random projection to protect online vendors' privacy while still providing accurate predictions from distributed data without sacrificing online performance. We utilize random projection to eliminate the aforementioned issues so vendors can work in partnerships. We suggest privacy-preserving schemes to offer recommendations based on vertically or horizontally partitioned data among multiple companies. The recommended methods are analyzed in terms of confidentiality. We also analyze the superfluous loads caused by privacy concerns. Finally, we perform real data-based trials to evaluate the accuracy of the proposed schemes. The results of our analyses show that our methods preserve privacy, cause insignificant overheads, and offer accurate predictions.

2017 ◽  
Vol 17 (2) ◽  
pp. 44-55 ◽  
Author(s):  
M. Antony Sheela ◽  
K. Vijayalakshmi

Abstract Data mining on vertically or horizontally partitioned dataset has the overhead of protecting the private data. Perturbation is a technique that protects the revealing of data. This paper proposes a perturbation and anonymization technique that is performed on the vertically partitioned data. A third-party coordinator is used to partition the data recursively in various parties. The parties perturb the data by finding the mean, when the specified threshold level is reached. The perturbation maintains the statistical relationship among attributes.


2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Yu Li ◽  
Yuan Zhang ◽  
Yue Ji

With the arrival of the big data era, it is predicted that distributed data mining will lead to an information technology revolution. To motivate different institutes to collaborate with each other, the crucial issue is to eliminate their concerns regarding data privacy. In this paper, we propose a privacy-preserving method for training a restricted boltzmann machine (RBM). The RBM can be got without revealing their private data to each other when using our privacy-preserving method. We provide a correctness and efficiency analysis of our algorithms. The comparative experiment shows that the accuracy is very close to the original RBM model.


Author(s):  
Xiaodong Lin ◽  
Alan F. Karr

Recent technological advances enable the collection of huge amounts of data. Commonly, these data are generated, stored, and owned by multiple entities that are unwilling to cede control of their data. This distributed environment requires statistical tools that can produce correct results while preserving data privacy. Privacy-preserving protocols have been proposed to solve specific statistical analysis such as linear regression, clustering, and classification. In this paper, we present methods and protocols for privacy-preserving maximum likelihood estimation in general settings. We discuss both horizontally and vertically partitioned data, and propose procedures that allow participating parties to withdraw from the joint computation. Logistic regression is used to demonstrate our method.


2012 ◽  
Vol 21 (01) ◽  
pp. 1250009 ◽  
Author(s):  
YOUWEN ZHU ◽  
LIUSHENG HUANG ◽  
TSUYOSHI TAKAGI ◽  
MINGWU ZHANG

Recently, growing privacy concerns have received more and more attention and it becomes a significant topic on how to preserve private-sensitive information from being violated in distributed cooperative computation. In this paper, we first propose a novel-general privacy-preserving online analytical processing model based on secure multiparty computation. Then, based on the new model, two schemes to privacy-preserving count aggregate query over both horizontally partitioned data and vertically partitioned data are proposed. Additionally, we also propose several efficient subprotocols that serve as the basic secure buildings. Furthermore, we analyze correctness, security, communication cost, and computation complexity of our proposed protocols, and show that the new schemes are secure, having good linear complexity and that the query results are exactly accurate.


Author(s):  
Stanley R.M. Oliveira ◽  
Osmar R. Zaïane

The sharing of data is beneficial in data mining applications and widely acknowledged as advantageous in business. However, information sharing can become controversial and thwarted by privacy regulations and other privacy concerns. Rather than simply hindering data owners from sharing information for data analysis, a solution could be designed to meet privacy requirements and guarantee valid data clustering results. To achieve this dual goal, this chapter introduces a method for privacy-preserving clustering, called Dimensionality Reduction-Based Transformation (DRBT). This method relies on the intuition behind random projection to protect the underlying attribute values subjected to cluster analysis. It is shown analytically and empirically that transforming a dataset using DRBT, a data owner can achieve privacy preservation and get accurate clustering with little overhead of communication cost. The advantages of such a method are: it is independent of distance-based clustering algorithms; it has a sound mathematical foundation; and it does not require CPU-intensive operations.


Computation ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 6
Author(s):  
Maria Eleni Skarkala ◽  
Manolis Maragoudakis ◽  
Stefanos Gritzalis ◽  
Lilian Mitrou

Distributed medical, financial, or social databases are analyzed daily for the discovery of patterns and useful information. Privacy concerns have emerged as some database segments contain sensitive data. Data mining techniques are used to parse, process, and manage enormous amounts of data while ensuring the preservation of private information. Cryptography, as shown by previous research, is the most accurate approach to acquiring knowledge while maintaining privacy. In this paper, we present an extension of a privacy-preserving data mining algorithm, thoroughly designed and developed for both horizontally and vertically partitioned databases, which contain either nominal or numeric attribute values. The proposed algorithm exploits the multi-candidate election schema to construct a privacy-preserving tree-augmented naive Bayesian classifier, a more robust variation of the classical naive Bayes classifier. The exploitation of the Paillier cryptosystem and the distinctive homomorphic primitive shows in the security analysis that privacy is ensured and the proposed algorithm provides strong defences against common attacks. Experiments deriving the benefits of real world databases demonstrate the preservation of private data while mining processes occur and the efficient handling of both database partition types.


Author(s):  
Justin Zhan

To conduct data mining, we often need to collect data from various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The goal of this paper is to provide solutions for privacy-preserving k-nearest neighbor classification which is one of data mining tasks. Our goal is to obtain accurate data mining results without disclosing private data. We propose a formal definition of privacy and show that our solutions preserve data privacy.


Author(s):  
IBRAHIM YAKUT ◽  
HUSEYIN POLAT

Collaborative filtering (CF) systems are widely employed by many e-commerce sites for providing recommendations to their customers. To recruit new customers, retain the current ones, and gain competitive edge over competing companies, online vendors need to offer accurate predictions efficiently. Therefore, providing precise recommendations efficiently to many users in real time is imperative. Singular value decomposition (SVD) is applied to CF to achieve such goal. SVD-based CF systems offer reliable and accurate predictions when they own large enough data. Data collected for CF purposes, however, might be split between different companies, even competing ones. Some vendors, especially newly established ones, might have problems with available data. To increase mutual advantages, provide richer CF services, and overcome problems caused by inadequate data, companies want to integrate their data. However, due to privacy, legal, and financial reasons, they do not want to combine their data. In this article, we investigate how to provide SVD-based referrals on partitioned (horizontally or vertically) data without greatly jeopardizing data holders' privacy. We conduct real data-based experiments to assess our schemes' overall performance and analyze them in terms of privacy and supplementary costs. Our results show that it is possible to provide accurate SVD-based referrals on integrated data while preserving e-companies' privacy.


Sign in / Sign up

Export Citation Format

Share Document